Visualizing, and learning, the relationships among kanji, words, and morphemes

I made a free, open-source tool for visualizing the relationships between kanji, words, and morphemes in sentences. [Feel free to take a look](https://japanesegraph.com), or see a demo [on GitHub](https://github.com/mreichhoff/HanziGraph/tree/JapaneseGraph#demo). You can also link to specific words (like [永遠](https://japanesegraph.com/%E6%B0%B8%E9%81%A0)), or, on supported browsers (should be all but [firefox, for now](https://bugzilla.mozilla.org/show_bug.cgi?id=1423593)), have it [break down a sentence](https://japanesegraph.com/%E3%82%AB%E3%83%BC%E3%83%89%E3%81%A7%E6%94%AF%E6%89%95%E3%81%88%E3%81%BE%E3%81%99%E3%81%8B).

(end TL;DR)

The tool has a few ways of studying:

**Kanji node network**

The idea is to build mental connections among kanji using a node network diagram.

I analyzed millions of subtitles and the tanaka corpus to find the most common words, then picked out the words with kanji and connected those that appeared in the same word, for a structure that is color-coded based on word frequency (or, optionally, the JLPT levels).

The kanji and words have definitions (from JMDict) and sentences (human-written, from the Tanaka corpus via Tatoeba; most with furigana that can be toggled on and off).

You can also see [the post of an earlier version from last year](https://www.reddit.com/r/LearnJapanese/comments/u5n2xf/learning_kanji_through_the_words_that_connect_them/).

**Sankey flow diagrams**

I also added a way of visualizing how [morphemes](https://en.wikipedia.org/wiki/Morpheme) are used together to form phrases and sentences. [Here’s an example](https://japanesegraph.com/生きる/flow).

This was done by:
* Segmenting millions of sentences via [Mecab](https://taku910.github.io/mecab/) to get the morphemes (Mecab doesn’t quite work at what is normally viewed as the word level, but it’s a similar idea to splitting a sentence into words).
* For each morpheme found, finding the 12 most common chains of 2 and 3 segments that include the morpheme.
* Rendering a [Sankey diagram](https://en.wikipedia.org/wiki/Sankey_diagram) to illustrate the flow among segments. This type of diagram is often used to illustrate cost breakdowns, energy usage, or other processes. It’s useful for thinking about how sentences flow together, too. The idea is that taller bars mean a connection is more common, and based on that, you can quickly get an idea of how a word can form a phrase, or what the most common forms of the word are, and how those forms are most commonly followed or preceded.

**Cumulative kanji frequency graphs**

The tool also has cumulative kanji usage graphs. These were derived by:
* Analyzing the millions of sentences in a few sources in [opus](https://opus.nlpl.eu/) (subtitles, wikipedia articles, UN documents, etc.) to find the number of times each kanji was used.
* Rendering the curve of the cumulative percentage of kanji recognized for each kanji you’d learn, if you’d learn them in order of frequency.

This way, you can see how much bang for your buck (so to speak) you’re getting out of learning a given kanji.

**Other**

The tool also:
* Creates recall, recognition, and cloze flashcards, and can export them to anki or study them directly.
* Recommends kanji based on what you’ve seen.
* Runs entirely in-browser. You can create an account if you want to sync your stats and flashcards across devices, but otherwise there’s no need for any sort of account.
* Can be installed as [a PWA](https://en.wikipedia.org/wiki/Progressive_web_app) for an app-like experience.

**Future work**

* Integration with AnkiConnect, the same way Yomichan integrates with Anki.
* Maybe: adding AI-generated example sentences to augment the human-written ones (I’d never replace the human ones though).

Leave a Reply
You May Also Like