Skip to contents

Corpora Tools

read_aozora()
Download text file from Aozora Bunko
read_ja_text8()
Read the ja.text8 corpus
read_jrte()
Read the JRTE Corpus
read_ldnws()
Read the Livedoor News Corpus

Utilities

clean_emoji()
Remove emojis
clean_url()
Remove URLs
download_unidic()
Download and unzip 'UniDic'
is_within_era()
Check if dates are within Japanese era
jrte_rte_files()
Data for Textual Entailment
ldnws_categories()
List of categories of the Livedoor News Corpus
parse_jrte_reasoning()
Parse reasoning column of 'rte.*.tsv'
parse_to_jdate()
Parse dates to Japanese dates
unidic_availables()
List of available 'UniDic'

Datasets

AozoraBunkoSnapshot
Meta data of text files published on Aozora Bunko
NekoText
Whole text of ‘Wagahai Wa Neko Dearu’ written by Natsume Souseki from Aozora Bunko