Changelog
Source:NEWS.md
gibasa 1.1.0
CRAN release: 2024-02-17
- Corrected probabilistic IDF calculation by
global_idf3
. - Refactored
bind_tf_idf2
.-
Breaking Change: Changed behavior when
norm=TRUE
. Cosine nomalization is now performed ontf_idf
values as in the RMeCab package. - Added
tf="itf"
andidf="df"
options.
-
Breaking Change: Changed behavior when
gibasa 0.9.5
CRAN release: 2023-07-09
- Removed audubon dependency for maintainability.
-
pack
now preservesdoc_id
type when it’s factor.
gibasa 0.9.4
CRAN release: 2023-06-03
- Updated Makevars for Unix alikes. Users can now use a file specified by the
MECABRC
environment variable or~/.mecabrc
to set up dictionaries.
gibasa 0.8.0
-
Breaking Change: Changed numbering style of ‘sentence_id’ when
split
isFALSE
. - Added
grain_size
argument totokenize
. - Added new
bind_lr
function.
gibasa 0.7.4
- Use
RcppParallel::parallelFor
instead oftbb::parallel_for
. There are no user’s visible changes.
gibasa 0.7.0
-
tokenize
can now accept a character vector in addition to a data.frame like object. -
gbs_tokenize
is now deprecated. Please use thetokenize
function instead.
gibasa 0.6.3
- Added the
partial
argument togbs_tokenize
andtokenize
. This argument controls the partial parsing mode, which forces to extract given chunks of sentences when activated.
gibasa 0.6.2
- More friendly errors are returned when invalid dictionary path was provided.
- Added new
posDebugRcpp
function.
gibasa 0.5.1
- Added some new functions.
-
bind_tf_idf2
can calculate and bind the term frequency, inverse document frequency, and tf-idf of the tidy text dataset. -
collapse_tokens
,mute_tokens
, andlexical_density
can be used for handling a tidy text dataset of tokens.
-