The lexical density is the proportion of content words (lexical items) in documents. This function is a simple helper for calculating the lexical density of given datasets.
Usage
lex_density(vec, contents_words, targets = NULL, negate = c(FALSE, FALSE))
Arguments
- vec
A character vector.
- contents_words
A character vector containing values to be counted as contents words.
- targets
A character vector with which the denominator of lexical density is filtered before computing values.
- negate
A logical vector of which length is 2. If passed as
TRUE
, then respectively negates the predicate functions for counting contents words or targets.
Examples
head(hiroba) |>
prettify(col_select = "POS1") |>
dplyr::group_by(doc_id) |>
dplyr::summarise(
noun_ratio = lex_density(POS1,
"\u540d\u8a5e",
c("\u52a9\u8a5e", "\u52a9\u52d5\u8a5e"),
negate = c(FALSE, TRUE)
),
mvr = lex_density(
POS1,
c("\u5f62\u5bb9\u8a5e", "\u526f\u8a5e", "\u9023\u4f53\u8a5e"),
"\u52d5\u8a5e"
),
vnr = lex_density(POS1, "\u52d5\u8a5e", "\u540d\u8a5e")
)
#> # A tibble: 3 × 4
#> doc_id noun_ratio mvr vnr
#> <fct> <dbl> <dbl> <dbl>
#> 1 1 1 NaN 0
#> 2 2 1 NaN 0
#> 3 3 0 NaN NaN