Skip to contents

Parse XML output of CaboCha

Usage

ppn_parse_xml(
  path,
  into = c("POS1", "POS2", "POS3", "POS4", "X5StageUse1", "X5StageUse2", "Original",
    "Yomi1", "Yomi2"),
  col_select = seq_along(into)
)

Arguments

path

String; output from pipian::ppn_cabocha.

into

Character vector; feature names of output.

col_select

Character or integer vector; features that will be kept in the result.

Value

A tibble.

Examples

head(ppn_parse_xml(system.file("sample.xml", package = "pipian")))
#> # A tibble: 6 × 19
#>   doc_id sentence_id chunk_id token_id token   chunk_link chunk_score chunk_head
#>    <int>       <int>    <int>    <int> <chr>        <int>       <dbl>      <int>
#> 1      1           1        1        0 ふと             2        1.29          1
#> 2      1           1        2        1 振り向く……         37       -2.34          2
#> 3      1           1        2        2 と              37       -2.34          2
#> 4      1           1        2        3 、              37       -2.34          2
#> 5      1           1        3        4 たくさん……          4        1.93          5
#> 6      1           1        3        5 の               4        1.93          5
#> # ℹ 11 more variables: chunk_func <int>, entity <chr>, POS1 <chr>, POS2 <chr>,
#> #   POS3 <chr>, POS4 <chr>, X5StageUse1 <chr>, X5StageUse2 <chr>,
#> #   Original <chr>, Yomi1 <chr>, Yomi2 <chr>