Japanese	Romaji	English Translation
5	うめがえに	ume ga e ni	at the plum branch
7	きゐるうぐひす	kiiru uguhisu	warbler came
5	はるかけて	haru kakete	cries over spring
7	なけどもいまだ	nake domo imada	even though it cries
7	ゆきはふりつつ	yuki ha furi tsutsu	snow keeps falling

No.	Translator	Year	Pages	Translation Style
1.	Kaneko Motoomi*	1933	1,105	Literal translation
2.	Kubota Utsubo	1960	1,449	Literal translation
3.	Matsuda Takeo	1968	1,998	Free translation
4.	Ozawa Masao	1971	544	Changes word order and grammar
5.	Takeoka Masao	1976	2,278	Literal translation
6.	Okumura Tsuneya	1978	434	Respects author's intent
7.	Kusojin Hitaku	1979	1,260	Supplements words
8.	Komachiya Teruhiko	1982	407	Unknown
9.	Kojima Noriyuki & Arai Eizo	1989	483	Unknown
10.	Katagiri Yoichi	1998	3,022	Literal translation

Technique description	Example
Compressing sentences into words	梅の花 (ume no hana)
Using Chinese characters to condense meaning	朝露 (asa tsuyu)
Avoiding Repetition for emphasis	降りつつ (furi tsutsu)
Abstracting emotions	鳴く→cry(birds)..cry(human)
Omitting unnecessary words	白露 (shira tsuyu)
Leaving interpretation to the reader	白...雪/snow, 花/flower

JPN_ONLY_START ![right :30%](./images/harugasumi.jpg) <img src="./images/harugasumi.jpg" height="300mm" /> <div style="text-align: right;"> Spring Haze <img src="./images/harugasumi2.webp" height="300mm" /> <img src="./images/harugasumi.jpg" height="300mm" /> </div>

JPN_ONLY_START - Only 31 syllables with 5,7,5,7,7 sounds - JPN:自然や感情を簡潔に表現する特徴 - 掛詞、枕詞、序詞 JPN_ONLY_END

JPN_ONLY_START 梅の枝に来馴れている鴬が、冬時分からこの春へかけて頻りに鳴くけれども、未だ雪は降り降りして、一向春めかぬことよ。

### **Poetic Rules may include:** - omission of grammatical elements - inversion of word order - symbolic substitution - nominalization - manipulation of ambiguity - compression of meaning - expansion of meaning - reinterpretation of context ... ---

### **A: Kokinshu 1000 original dataset (OP)** - **[Hachidaishu Classical Japanese Poetic Vocabulary Dataset](https://zenodo.org/records/14001396)** on Zenodo contains the original poems of the Hachidaishu (including the Kokinshu) and their semantic codes. - https://zenodo.org/records/14001396 - Creators: Yamamoto, Hilofumi and Hodošček, Bor - Published: October 28, 2024 / Version v1.0.1 - Hachidaishu classical Japanese poetic vocabulary dataset - [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14001396.svg)](https://doi.org/10.5281/zenodo.14001396) ---

--- ### **_Kokinwakashu Hyoshaku_ by Motoomi Kaneko** - only Kaneko Motoomi's translation is available on Zenodo. - [Kokinwakashu Hyoshaku by Motoomi Kaneko translation sentence vocabulary dataset](https://zenodo.org/records/13942707) - https://zenodo.org/records/13942707 - Hilofumi Yamamoto, Bor Hodošček, and Xudong Chen - Published October 16, 2024 / Version v1.0.1 - [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13942707.svg)](https://doi.org/10.5281/zenodo.13942707)

https://github.com/masayu-a/WLSP ### **Steps** - Step 1: Prepare Kokinshu 1000 original dataset (OP). - Step 2: Prepare 10 kinds of translation datasets (CT). - Step 3: Divide both OP and CT sentences into tokens. - Step 4: Attach Meta codes based on WSLP (semantic principle codes) to each token. - Step 5: Compare OP with CT by Meta codes. - Step 6: Describe the predication construction patterns. - Step 7: Describe the noun phrase construction patterns. - Step 8: Modeling of poetic construction.

BG-18-0000-00-000-X:counting ``` BG-01-1000-00-000-X:こそあど%demonstrative_pronoun BG-01-1100-00-000-X:類・例%class,kinds BG-02-1000-00-000-X:抽象的関係%abstract_relation BG-02-1110-00-000-X:関係%relation BG-03-3100-00-000-X:ことば・言語%language_and_speech BG-03-3400-00-000-X:身上%personal_affairs BG-04-1100-00-000-X:接続%conjunction BG-05-0000-00-000-X:接頭辞%prefix BG-06-0000-00-000-X:接中辞%infix BG-07-0000-00-000-X:接尾辞%suffix BG-08-0061-00-000-X:助詞-格助詞-一般%case_particle BG-09-0000-00-000-X:助動詞%auxiliary_verb BG-10-0000-00-000-X:補助動詞・補助形容詞%auxiliary_verb_and_auxiliary_adjective BG-11-0000-00-000-X:関係詞%relative_pronoun BG-12-0000-00-000-X:語尾%word_endings BG-13-0000-00-000-X:前置詞・介詞%preposition_and_postposition BG-14-0000-00-000-X:意味不明%meaning_unknown BG-15-0000-00-000-X:固有名詞%proper_noun BG-16-0000-01-000-X:句点読点%punctuation BG-17-0000-00-000-X:掛詞処理%wordplay_handling BG-18-0000-00-000-X:助数詞%counting ```

## **Computer Tools** ### **code2match.c** - Align waka with contemporary translations - github: [https://github.com/yamagen/code2match](https://github.com/yamagen/code2match) ``` % cat op_file.txt ct_file.txt | code2match -a ``` --- ### code2match -h <div class="dataset"> % code2match [-ahv] file.... -a print all data -b print between check -c print calculation table -d print predicate part out -e once matched out (bag of words option) use it with other options -i print calculation in line style -l print token list table -o print original poem out -p print pair token table -r print residual -s print valid on -t print title -u print unmatched portion -h print this help -v print code2match version (c) 2025 H. Yamamoto yamagen@ila.titech.ac.jp </div> ---

### **Script to run code2match** ```bash #!/bin/sh # This script compares two directories containing Waka poems and their translations. if [ "$#" -lt 3 ]; then echo "Usage: $0 <dir1> <dir2> <id> [option]" exit 1 fi DIR1="$1" DIR2="$2" ID=$(printf "%04d" "$3") # ID can be 1-9999, so we format it to 4 digits OPTION="$4" # Optional argument for code2match cat "$DIR1/$ID.db.txt" "$DIR2/$ID.db.txt" | ../src/code2match $OPTION ``` --- ### **Script: loop 1-1000 to run code2match** ``` #!/bin/sh # args: $1 = kokin directory name (e.g., kokin) # $2 = contemporary translation directory name (e.g., kaneko) # $3 = poem ID or range (e.g., 1, 100, or 1-100) # $4 = optional argument for code2match (e.g., -d, -r) SRC=../src/code2match # judge if $3 is a range or a single number if echo "$3" | grep -qE '^[0-9]+-[0-9]+$'; then START=$(echo "$3" | cut -d- -f1) END=$(echo "$3" | cut -d- -f2) else START=$3 END=$3 fi # Loop through the specified range or single number for i in $(seq "$START" "$END"); do FILE1="$1/$(printf '%04d' "$i").db.txt" FILE2="$2/$(printf '%04d' "$i").db.txt" if [ -n "$4" ]; then cat "$FILE1" "$FILE2" | "$SRC" "$4" else cat "$FILE1" "$FILE2" | "$SRC" fi done ```

- John Tukey's Exploratory Data Analysis (EDA) is a good start. A foundational work in exploratory data analysis (EDA) that introduced the stem-and-leaf display as a way to visualize data distributions effectively. Also Box's methodology - We will seek evidence, but even more than that,

- **"Not a black box" assurance** Both cluster analysis and visualization are conducted with the researchers manually verifying the correspondence between data changes and hypotheses, which is crucial in linguistic research. - **"Small examples lead to bigger/better understanding"** The process of analyzing the data is not just about the final results, but also about understanding the individual transformations. For example, showing how "春霞 → 春 + ... + 霞" or "ふりつつ/furitsutsu → 降り降りして/furi furi shite" illustrates each transformation helps convey the meaning of the analysis. - **"Hands-on approach"** The hands-on approach allows researchers to explore the data in a way that is not just about the final results, but also about understanding the individual transformations. This is especially important in linguistic research, where the meaning of the data is often complex and nuanced.

### **Four Seasons Sections of Kokin Wakashū** | Section | Volume Number | Range | Corresponding Numbers | Number of Poems | | ------- | ------------- | ------------ | --------------------- | --------------- | | Spring | Volume 1 | Spring Upper | 1-55 | 55 poems | | Spring | Volume 2 | Spring Lower | 56-110 | 55 poems | | Summer | Volume 3 | Summer | 111-124 | 14 poems | | Autumn | Volume 4 | Autumn Upper | 125-179 | 55 poems | | Autumn | Volume 5 | Autumn Lower | 180-232 | 53 poems | | Winter | Volume 6 | Winter | 233-249 | 17 poems | --- ### **Content words** --- ``` $ ./c2m.sh kokin kaneko 1-100 -r| awk '/BG-01/{print $9, $10}' | sort | uniq -c | sort -nr | nl | head -20 1 41 BG-01-5530-12-010-A 花 flower 2 39 BG-01-1010-01-020-A こと thing 3 18 BG-01-1000-01-050-A それ that 4 16 BG-01-1000-03-010-A もの thing 5 13 BG-01-2000-06-080-A 人 person 6 11 BG-01-5520-20-040-A 梅 plum 7 10 BG-01-5520-20-100-A 桜 cherry 8 10 BG-01-1000-01-020-A これ this 9 9 BG-01-2000-01-300-A 自分 self 10 9 BG-01-1610-01-010-A 時 time 11 8 BG-01-1624-02-010-A 春 spring 12 6 BG-01-4000-01-080-A 物 thing 13 6 BG-01-1990-05-030-A さえ even 14 6 BG-01-1770-01-050-A 外 outside 15 6 BG-01-1642-01-030-A 昔 past 16 6 BG-01-1610-03-020-A 間 while 17 5 BG-01-5153-07-010-A 雪 snow 18 5 BG-01-3066-02-080-A はず should 19 5 BG-01-1770-01-030-A 内 inside 20 5 BG-01-1641-01-010-A 今 now ``` ---

### **Remarks** - The 31-syllable form is not a fixed structure but a flexible framework. - Poets use the 31-syllable form to express their emotions and thoughts in a concise manner. - Preference for generic yet symbolic nouns (e.g., "hana" = flower) over specific ones (e.g., "hana tachibana"). - Abstract, deictic, or self-referential nouns are often avoided. - Temporal references are not expressed through general time words but through poetic imagery. ---

--- #### **Command executed** ```bash ./c2m.sh kokin kaneko 1-100 -d \ | awk '{print length($0), $0}' \ | sort -nr \ | nl \ | head -10 ``` - Extracted the top 10 longest predicate mappings between the Kokin and Kaneko corpora - Each line shows: - Length in characters - Original predicate span ([start|…|end]) - ⇒ Transformed predicate span in the waka context --- <div class="datasetsmall"> $ ./c2m.sh kokin kaneko 1-100 -d| awk '{print length($0), $0}' | sort -nr | nl | head -10 1 148 PRED: kaneko 86 [21|ふく|らむ|22] => [07|吹か|ぬ|時|に|も|、|雪|の|よう|に|ひたすら|散る|が|、|それ|さえ|以て|惜しく|ある|もの|を|、|また|この上|どのように|烈しく|散れ|と|いっ|て|、|こう|も|風|が|吹く|の|で|あろ|う|57] 2 111 PRED: kaneko 78 [21|まちみ|て|ちら|ば|ちら|なむ|29] => [35|待っ|て|み|て|、|いよいよ|来|ぬ|時|に|こそ|、|散る|なら|ば|お前|の|勝手|に|散っ|て|貰お|う|わ|64] 3 98 PRED: kaneko 36 [11|をり|て|かざさ|む|15] => [27|折り取っ|て|、|我が|容貌|の|老|も|隠れる|か|どう|か|と|、|試し|に|挿頭し|て|みよ|う|58] 4 96 PRED: kaneko 11 [01|き|ぬ|04] => => [02|た|と|世間|の|人|は|いう|が|、|まだ|鴬|は|鳴い|て|い|ない|、|自分|は|何でも|鴬|の|鳴か|ぬ|31] 5 94 PRED: kaneko 76 [12|をしへよ|いき|て|うらみ|む|18] => [32|教え|て|くれ|よ|、|然らば|、|そこ|に|行っ|て|思う存分|恨み|を|いお|う|52] 6 91 PRED: kaneko 61 [04|くははれ|る|05] => [11|加わっ|て|長く|なっ|た|今年|なり|とも|、|人|の|心|に|は|なぜ|厭か|れ|は|せ|ぬ|39] 7 85 PRED: kaneko 77 [06|ちり|な|む|11] => [08|散る|なら|ば|、|自分|も|一緒|に|何処|へ|なり|と|退散|し|て|しまお|う|32] 8 85 PRED: kaneko 74 [03|ちら|ば|ちら|なむ|ちら|ず|11] => [07|散る|なら|ば|勝手|に|散っ|て|貰お|う|、|たとえ|散ら|ず|22] 9 83 PRED: kaneko 45 [00|くる|と|あく|と|めかれ|ぬ|ものを|12] => [12|と|いっ|て|は|見|、|夜|が|明ける|と|いっ|て|31] 10 80 PRED: kaneko 63 [01|こ|ず|03] => [07|来れ|ば|こそ|、|この|桜|を|花|と|は|見|ますれ|、|若し|今日|来|ぬ|28] </div> ---

- Ways to combine two Chinese characters:

JPN 川なども)。又これらは他品詞に派生もするのである(浅し↓浅緑・浅み、古↓古す・古し・古枝・古里等、露↓露けし・朝露・下露・白露・露霜等々)。このように、 - このように、万葉に比べて古今の語は、語基の種類は多くないが、同一語基を多くの角度から多彩に用いているといえよう。 - そして、「朝に降りている露」を「朝露」、「白く光った露」を「白露」というように、短い語として自立させる。 - これも一種の圧縮法である。 - 一首の中に封じ込めるには、短く一語とした方がひきしまるし、多くのことが述べられるからである。 - こういう自在な造語法は、漢語の造語法にならったといえよう。 - 漢語は和語化され(血涙→血の涙)、和語は漢語的造語法で歌語らしく創り出され、圧縮される。 - しかも、純粋の接辞(～さ、～げ、～さぶ、～ばせ等)による造語は少ない。(疑問認にみられるくらいである――幾つ・幾ら・いつか・いつこ・いっち・いつしか・いつら・いつれ等)。 - このことからも、古今集の語構成が、自立語基を結合して循環的に組成されていることがわかる。 - 同一語をいかに美しく組み合わせて使うかに腐心していたようである。 - このようにして出来上った語が、広義の歌語といわれるものであろう。 - 古今集の一語一語が、選び抜かれた、美しくかっ意味深い語であるといえよう。 JPN

--- ### **Poetic compression** - Poetic compression is a key feature of _waka_. - It involves condensing complex ideas into concise phrases. | **Normal narration (Expansion)** | **Poetic compression (Condensation)** | | ----------------------------------- | ------------------------------------- | | 梅の花を折ってしまったので | 梅の花 | | I picked a plum blossom and it fell | ume no hana | | 朝に降りている露 | 朝露 | | Morning dew falling down | asa tsuyu | | 白く光った露 | 白露 | | White dew shining | shira tsuyu | | 鳴いていないけれども | かけてなけども | | Not chirping, but | kakete nakedomo | | ゆっくりと降り続いている雨 | 降り降りして | | Slowly keep falling rain | furi furi shite |

## **Discussion** - Explore poetic compression in modern Japanese - Analyze constraints in poetic expression - Discuss implications for translation and interpretation - Consider cultural and linguistic factors - Identify and classify poetic strategies - Analyze how poetic thought is transfigured - Uncover underlying rules (overt and covert) - Explore the implications of compression - Simulate the transformation process

相手がベーシックな用語をみてアイデアを拡張しやすい。逆にいうと、アイディアが拡張しやすいものを著者が選んでいる。

### **Future research directions** - Explore the description of poetic compression in other languages and cultures - Invent new description methods for colloquial sentences **_→Immediate grammar and syntax_** - Develop ecosystem for analyzing colloquial sentences in terms of: **_→Semantic and syntactic compression_** **_→A co-development environment for data ecosystems_** **_→A collaborative platform for developing data ecosystems_** **_→A co-creation environment of tools_** ---

Plotting Poetry 2025

Transforming Poetic Thought into Waka:

How to Pack the Skeleton into a 31-Syllable Closet

Basics of WAKA

Early established waka

Example from the Kokinshu

Example 2

Waka: Stylistic and rhetorics perspective

Preface of Kokinshū: Kanajo

Preface of Kokinshū: Kanajo

Poetic ideas pack into the 31-syllable form

Methods

Obtain some typical conversion patterns from both

Through the comparison of OP and CT, we can obtain:

Materials

B: Ten sets of contemporary translations

Methods

Subtraction

Parallel comparison between OP and CT

OP: Kokinshu No. 3

CT: Kaneko No. 3

Meta-code system: hierarchical semantic categories

Examples of code categories with English annotation

Pair token table: code2match -p

Extract residual of Kaneko no. 5: code2match -r

Element breakdown between OP and CT: code2match -c

Predicate alignments between OP and CT: code2match -d

The compression of poetic thought into 31-syllable form: Questions

Considerations in approach

Why is it important that researchers go "hands-on"?

Results of the hands-on process

Nouns avoided in waka (top 20 residuals)

Key insights

Comment on nature themes as residuals

Key observations from predicate correspondence analysis

Word Types

Summary of poetic compression techniques encountered

Questions for discussion

Conclusion

References

References (cont.)

Pair token table: `code2match -p`

Extract residual of Kaneko no. 5: `code2match -r`

Element breakdown between OP and CT: `code2match -c`

Predicate alignments between OP and CT: `code2match -d`