Skip to contents

Tokenize sentences using a tagger

Usage

tokenize(
  x,
  text_field = "text",
  docid_field = "doc_id",
  split = FALSE,
  mode = c("parse", "wakati"),
  tagger
)

Arguments

x

A data.frame like object or a character vector to be tokenized.

text_field

<data-masked> String or symbol; column containing texts to be tokenized.

docid_field

<data-masked> String or symbol; column containing document IDs.

split

split Logical. When passed as TRUE, the function internally splits the sentences into sub-sentences

mode

Character scalar to switch output format.

tagger

A tagger function created by create_tagger().

Value

A tibble or a named list of tokens.