Changelog
Source:NEWS.md
gibasa 1.1.2
CRAN release: 2025-02-16
- Bumped minimum R version to 4.2.0.
- Refactored
tagger_implto improve performance oftokenize(split = TRUE).
gibasa 1.1.1
CRAN release: 2024-07-06
-
tokenizenow warns rather than throws an error when an invalid input is given during partial parsing. With this change,tokenizeis no longer entirely aborted even if an invalid string is given. Parsing of those strings is simply skipped.
gibasa 1.1.0
CRAN release: 2024-02-17
- Corrected probabilistic IDF calculation by
global_idf3. - Refactored
bind_tf_idf2.-
Breaking Change: Changed behavior when
norm=TRUE. Cosine nomalization is now performed ontf_idfvalues as in the RMeCab package. - Added
tf="itf"andidf="df"options.
-
Breaking Change: Changed behavior when
gibasa 0.9.5
CRAN release: 2023-07-09
- Removed audubon dependency for maintainability.
-
packnow preservesdoc_idtype when it’s factor.
gibasa 0.9.4
CRAN release: 2023-06-03
- Updated Makevars for Unix alikes. Users can now use a file specified by the
MECABRCenvironment variable or~/.mecabrcto set up dictionaries.
gibasa 0.8.0
-
Breaking Change: Changed numbering style of ‘sentence_id’ when
splitisFALSE. - Added
grain_sizeargument totokenize. - Added new
bind_lrfunction.
gibasa 0.7.4
- Use
RcppParallel::parallelForinstead oftbb::parallel_for. There are no user’s visible changes.
gibasa 0.7.0
-
tokenizecan now accept a character vector in addition to a data.frame like object. -
gbs_tokenizeis now deprecated. Please use thetokenizefunction instead.
gibasa 0.6.3
- Added the
partialargument togbs_tokenizeandtokenize. This argument controls the partial parsing mode, which forces to extract given chunks of sentences when activated.
gibasa 0.6.2
- More friendly errors are returned when invalid dictionary path was provided.
- Added new
posDebugRcppfunction.
gibasa 0.5.1
- Added some new functions.
-
bind_tf_idf2can calculate and bind the term frequency, inverse document frequency, and tf-idf of the tidy text dataset. -
collapse_tokens,mute_tokens, andlexical_densitycan be used for handling a tidy text dataset of tokens.
-