An analysis of Kanji in the Japanese Clozemaster course

In preparation for [my upcoming attempt](https://www.reddit.com/r/clozemaster/comments/zl9t4f/which_language_should_i_livestream_myself/) to complete the entire Japanese Clozemaster Fluency Fast Track in 2023, I did a little digging into the contents of the course, to get an idea of just how proficient it could make me. I scraped all 19,999 sentences from the course and ran some textual analysis on them. The results may be useful to anyone considering using Clozemaster to learn Japanese to a proficient level!

In particular, I was interested in what kanji the course would teach, since I knew there were educational standards around learning kanji and I thought it would be a good way to benchmark progress. The analyses I did simply looked at which CJK characters appeared in sentences without regard for how they were being used or pronounced, since Clozemaster doesn’t provide translations or phonetic transcriptions of individual words. Since kanji can be pronounced in multiple ways and have multiple meanings, appearance of a particular character in the course doesn’t guarantee full mastery of that kanji, but I think the results I did find are interesting nonetheless.

I pulled the [list of jōyō kanji](https://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji) from Wikipedia, which is the list of kanji which are intended to be taught to Japanese students at each grade level, through secondary school. The list comprises 2,136 characters, broken into eight groups:

– Grade 1 (80 kanji)
– Grade 2 (160 kanji)
– Grade 3 (200 kanji)
– Grade 4 (200 kanji)
– Grade 5 (185 kanji)
– Grade 6 (181 kanji)
– Prefecture name characters taught in primary school (20 kanji)
– Secondary school (1110 kanji)

The second-to-last category I listed are characters which are only taught to students because they appear in the names of prefectures of Japan; according to the Wikipedia page about jōyō kanji, these are all taught in grade 4, but I included them as a separate category after the rest of primary school because they seem less relevant to learning Japanese as a second language outside of Japan, and because they appear at much lower frequency in the course than other primary school characters.

I looked both at which kanji appear as all or part of the cloze word being tested, and at which appear elsewhere in the sentence. For the most part, kanji which appear in some exercises elsewhere in the sentence also appear in at least one exercise as part of the cloze word. This makes sense given that the cloze word is supposed to be the least common word in the sentence.

Here are my results in a cumulative table. For each level, I list how many of the kanji up to and including that level there are, as well as how many:

– appear in the cloze word in at least one sentence
– appear in the cloze word in at least three sentences
– appear anywhere in the sentence (including the cloze word) in at least one sentence
– appear anywhere in the sentence in at least three sentences

level | # kanji | in cloze | in cloze >= 3 times | anywhere in sentence | anywhere in sentence >= 3 times
—|—|—-|—-|—-|—-
Grade 1 | 80 | 80 | 80 | 80 | 80
Grade 2 | 240 | 240 | 238 | 240 | 239
Grade 3 | 440 | 440 | 433 | 440 | 439
Grade 4 | 640 | 640 | 621 | 640 | 637
Grade 5 | 825 | 824 | 801 | 824 | 820
Grade 6 | 1006 | 1003 | 963 | 1003 | 995
Prefecture names | 1026 | 1018 | 971 | 1018 | 1005
Secondary school | 2136 | 1981 | 1506 | 1988 | 1739

I would say that the coverage is pretty good, at least through primary school! 1003 of the 1006 kanji which students are expected to learn by the end of grade six are taught by the course. The three which do not appear are 俵 “straw bag,” 后 “queen,” and 蚕 “silkworm.” I’m not sure how common these characters are in Japanese or why they’re not included in the course. Also, of those 1006 characters that Japanese students know by grade 6, all but 11 of them appear three or more times in the course, giving exposure to them in a variety of contexts.

Through secondary school, there are about 150 jōyō kanji missing from the course entirely, and hundreds more which are not seen more than once or twice. Depending on what level you aim to achieve, this may suggest that the Japanese course is not fully complete, but looked at another way, you will be exposed to over 90% of jōyō by the end of the course! I’m optimistic that if I do manage to complete the course, that it will leave me with a pretty substantial vocabulary that could easily be built upon simply by immersion and media consumption.

If you’d like to see the codebase I used to generate these results, check it out here: https://github.com/cstuartroe/clozemaster-analysis

Any follow-up analyses I should do? Maybe use kanji frequency lists instead of the education schedule? Take a look at other sentence collections on Clozemaster?

If you’d like to follow along my attempt to speedrun the course, you can join the discord: https://discord.gg/9S5GeUGM. I’ll post when I livestream, and other analytics and thoughts about Clozemaster as a learning tool as I go.

Leave a Reply
You May Also Like