Tokenize sentences using a tagger
Usage
tokenize(
x,
text_field = "text",
docid_field = "doc_id",
split = FALSE,
mode = c("parse", "wakati"),
tagger
)Arguments
- x
A data.frame like object or a character vector to be tokenized.
- text_field
<
data-masked> String or symbol; column containing texts to be tokenized.- docid_field
<
data-masked> String or symbol; column containing document IDs.- split
split Logical. When passed as
TRUE, the function internally splits the sentences into sub-sentences- mode
Character scalar to switch output format.
- tagger
A tagger function created by
create_tagger().