### **Poetic Rules may include:**
- omission of grammatical elements
- inversion of word order
- symbolic substitution
- nominalization
- manipulation of ambiguity
- compression of meaning
- expansion of meaning
- reinterpretation of context
...
---
### **A: Kokinshu 1000 original dataset (OP)**
- **[Hachidaishu Classical Japanese Poetic Vocabulary Dataset](https://zenodo.org/records/14001396)** on Zenodo contains the original poems of the Hachidaishu (including the Kokinshu) and their semantic codes.
- https://zenodo.org/records/14001396
- Creators: Yamamoto, Hilofumi and Hodošček, Bor
- Published: October 28, 2024 / Version v1.0.1
- Hachidaishu classical Japanese poetic vocabulary dataset
- [](https://doi.org/10.5281/zenodo.14001396)
---
---
### **_Kokinwakashu Hyoshaku_ by Motoomi Kaneko**
- only Kaneko Motoomi's translation is available on Zenodo.
- [Kokinwakashu Hyoshaku by Motoomi Kaneko translation sentence vocabulary dataset](https://zenodo.org/records/13942707)
- https://zenodo.org/records/13942707
- Hilofumi Yamamoto, Bor Hodošček, and Xudong Chen
- Published October 16, 2024 / Version v1.0.1
- [](https://doi.org/10.5281/zenodo.13942707)
https://github.com/masayu-a/WLSP
### **Steps**
- Step 1: Prepare Kokinshu 1000 original dataset (OP).
- Step 2: Prepare 10 kinds of translation datasets (CT).
- Step 3: Divide both OP and CT sentences into tokens.
- Step 4: Attach Meta codes based on WSLP (semantic principle codes) to each token.
- Step 5: Compare OP with CT by Meta codes.
- Step 6: Describe the predication construction patterns.
- Step 7: Describe the noun phrase construction patterns.
- Step 8: Modeling of poetic construction.
## **Computer Tools**
### **code2match.c**
- Align waka with contemporary translations
- github: [https://github.com/yamagen/code2match](https://github.com/yamagen/code2match)
```
% cat op_file.txt ct_file.txt | code2match -a
```
---
### code2match -h
<div class="dataset">
% code2match [-ahv] file....
-a print all data
-b print between check
-c print calculation table
-d <span class="red">print predicate part out</span>
-e once matched out (bag of words option)
use it with other options
-i print calculation in line style
-l print token list table
-o <span class="red">print original poem out</span>
-p print pair token table
-r <span class="red">print residual</span>
-s print valid on
-t print title
-u print unmatched portion
-h print this help
-v print code2match version
(c) 2025 H. Yamamoto yamagen@ila.titech.ac.jp
</div>
---
### **Script to run code2match**
```bash
#!/bin/sh
# This script compares two directories containing Waka poems and their translations.
if [ "$#" -lt 3 ]; then
echo "Usage: $0 <dir1> <dir2> <id> [option]"
exit 1
fi
DIR1="$1"
DIR2="$2"
ID=$(printf "%04d" "$3") # ID can be 1-9999, so we format it to 4 digits
OPTION="$4" # Optional argument for code2match
cat "$DIR1/$ID.db.txt" "$DIR2/$ID.db.txt" | ../src/code2match $OPTION
```
---
### **Script: loop 1-1000 to run code2match**
```
#!/bin/sh
# args: $1 = kokin directory name (e.g., kokin)
# $2 = contemporary translation directory name (e.g., kaneko)
# $3 = poem ID or range (e.g., 1, 100, or 1-100)
# $4 = optional argument for code2match (e.g., -d, -r)
SRC=../src/code2match
# judge if $3 is a range or a single number
if echo "$3" | grep -qE '^[0-9]+-[0-9]+$'; then
START=$(echo "$3" | cut -d- -f1)
END=$(echo "$3" | cut -d- -f2)
else
START=$3
END=$3
fi
# Loop through the specified range or single number
for i in $(seq "$START" "$END"); do
FILE1="$1/$(printf '%04d' "$i").db.txt"
FILE2="$2/$(printf '%04d' "$i").db.txt"
if [ -n "$4" ]; then
cat "$FILE1" "$FILE2" | "$SRC" "$4"
else
cat "$FILE1" "$FILE2" | "$SRC"
fi
done
```
- John Tukey's Exploratory Data Analysis (EDA) is a good start.
A foundational work in exploratory data analysis (EDA) that introduced the stem-and-leaf display as a way to visualize data distributions effectively.
Also Box's methodology
- We will seek evidence, but even more than that,
- **"Not a black box" assurance**
Both cluster analysis and visualization are conducted with the researchers manually verifying the correspondence between data changes and hypotheses, which is crucial in linguistic research.
- **"Small examples lead to bigger/better understanding"**
The process of analyzing the data is not just about the final results, but also about understanding the individual transformations. For example, showing how "春霞 → 春 + ... + 霞" or "ふりつつ/furitsutsu → 降り降りして/furi furi shite" illustrates each transformation helps convey the meaning of the analysis.
- **"Hands-on approach"**
The hands-on approach allows researchers to explore the data in a way that is not just about the final results, but also about understanding the individual transformations. This is especially important in linguistic research, where the meaning of the data is often complex and nuanced.
### **Remarks**
- The 31-syllable form is not a fixed structure but a flexible framework.
- Poets use the 31-syllable form to express their emotions and thoughts in a concise manner.
- Preference for generic yet symbolic nouns (e.g., "hana" = flower) over specific ones (e.g., "hana tachibana").
- Abstract, deictic, or self-referential nouns are often avoided.
- Temporal references are not expressed through general time words but through poetic imagery.
---
---
### **Poetic compression**
- Poetic compression is a key feature of _waka_.
- It involves condensing complex ideas into concise phrases.
| **Normal narration (Expansion)** | **Poetic compression (Condensation)** |
| ----------------------------------- | ------------------------------------- |
| 梅の花を折ってしまったので | 梅の花 |
| I picked a plum blossom and it fell | ume no hana |
| 朝に降りている露 | 朝露 |
| Morning dew falling down | asa tsuyu |
| 白く光った露 | 白露 |
| White dew shining | shira tsuyu |
| 鳴いていないけれども | かけてなけども |
| Not chirping, but | kakete nakedomo |
| ゆっくりと降り続いている雨 | 降り降りして |
| Slowly keep falling rain | furi furi shite |
## **Discussion**
- Explore poetic compression in modern Japanese
- Analyze constraints in poetic expression
- Discuss implications for translation and interpretation
- Consider cultural and linguistic factors
- Identify and classify poetic strategies
- Analyze how poetic thought is transfigured
- Uncover underlying rules (overt and covert)
- Explore the implications of compression
- Simulate the transformation process
### **Future research directions**
- Explore the description of poetic compression in other languages and cultures
- Invent new description methods for colloquial sentences
**_→Immediate grammar and syntax_**
- Develop ecosystem for analyzing colloquial sentences in terms of:
**_→Semantic and syntactic compression_**
**_→A co-development environment for data ecosystems_**
**_→A collaborative platform for developing data ecosystems_**
**_→A co-creation environment of tools_**
---