Calculates and binds the term frequency, inverse document frequency, and TF-IDF of the dataset. This function experimentally supports 4 types of term frequencies and 5 types of inverse document frequencies.
Arguments
- tbl
A tidy text dataset.
- term
<
data-masked> Column containing terms.- document
<
data-masked> Column containing document IDs.- n
<
data-masked> Column containing document-term counts.- tf
Method for computing term frequency.
- idf
Method for computing inverse document frequency.
- norm
Logical; If passed as
TRUE, TF-IDF values are normalized being divided with L2 norms.- rmecab_compat
Logical; If passed as
TRUE, computes values while taking care of compatibility with 'RMeCab'. Note that 'RMeCab' always computes IDF values using term frequency rather than raw term counts, and thus TF-IDF values may be doubly affected by term frequency.
Details
Types of term frequency can be switched with tf argument:
tfis term frequency (not raw count of terms).tf2is logarithmic term frequency of which base isexp(1).tf3is binary-weighted term frequency.itfis inverse term frequency. Use withidf="df".
Types of inverse document frequencies can be switched with idf argument:
idfis inverse document frequency of which base is 2, with smoothed. 'smoothed' here means just adding 1 to raw values after logarithmizing.idf2is global frequency IDF.idf3is probabilistic IDF of which base is 2.idf4is global entropy, not IDF in actual.dfis document frequency. Use withtf="itf".