Mikan: A Tool to Auto-Generate Flashcards from a Japanese Text (with optional frequency restriction!)

I’ve had this project kicking around for a while and I finally cleaned it up (sort of) enough to share.

[https://github.com/moniquemurphy/mikan](https://github.com/moniquemurphy/mikan)

The general idea is: Grab a Japanese text, run the script, get a CSV file of all the vocab words in it, glossed (within reason, parsers aren’t foolproof). Use this knowledge to treat any text like a guided reader without having to do the work manually.

If you don’t want highly frequent words because you know them already, you can restrict to, say, words less frequent than the top 1000, or words with a frequency value of > XYZ (if you know a lot about linguistic corpora and that’s something you’re into, go for the frequency value, if not, rank threshold is pretty easy to understand).

Caveats:

* You’ll need a certain base level of computer savvy to install and run it, but everything necessary is linked, free, and I tried to keep the installation explanations at an intermediate level.
* The frequency corpus data from BCCWJ is a large file, but since it required a pretty advanced level of Japanese to even figure out where the CSV file lived, I included it in the repo. It’s free for non-commercial use.
* I didn’t include kana in the output because parsing it from the JMDict file and matching it with the corresponding kanji was not straightforward for words with multiple readings. If this really gets under your skin, I encourage forking 🙂

If you happen to dabble in local EPWING files, there’s some code to use JSONified versions of them as an alternate to the (free and lovely, but less comprehensive) JMDict. I used [https://github.com/FooSoft/zero-epwing](https://github.com/FooSoft/zero-epwing) to convert to JSON.

I hope this is helpful to someone!

Leave a Reply
You May Also Like