Tokenize sentences using a tagger
Usage
tokenize(
x,
text_field = "text",
docid_field = "doc_id",
split = FALSE,
mode = c("parse", "wakati"),
tagger
)
Arguments
- x
A data.frame like object or a character vector to be tokenized.
- text_field
<
data-masked
> String or symbol; column containing texts to be tokenized.- docid_field
<
data-masked
> String or symbol; column containing document IDs.- split
split Logical. When passed as
TRUE
, the function internally splits the sentences into sub-sentences- mode
Character scalar to switch output format.
- tagger
A tagger function created by
create_tagger()
.