Collapse sequences of tokens by condition — collapse

Concatenates sequences of tokens in the tidy text dataset, while grouping them by an expression.

Usage

collapse_tokens(tbl, condition, .collapse = "")

Arguments

tbl: A tidy text dataset.
condition: <data-masked> A logical expression.
.collapse: String with which tokens are concatenated.

Value

A data.frame.

Details

Note that this function drops all columns except but 'token' and columns for grouping sequences. So, the returned data.frame has only 'doc_id', 'sentence_id', 'token_id', and 'token' columns.

Examples

# \donttest{
df <- prettify(head(hiroba), col_select = "POS1")
collapse_tokens(df, POS1 == "\u540d\u8a5e")
#> # A tibble: 5 × 4
#>   doc_id sentence_id token_id token   
#>   <fct>        <int>    <int> <chr>   
#> 1 1                1        1 ポラーノ
#> 2 1                1        2 の      
#> 3 1                1        3 広場    
#> 4 2                2        1 宮沢賢治
#> 5 3                3        1 前      
# }