What would you consider a high enoug frequency number on “innocent corpus”?


I’m just going through texts with Yomichan and I’m wondering where I should draw the line on which words to add to my Anki decks. Which number on innocent corpus would you consider rare or common? Thanks

  1. I don’t know what exactly it was based on, but in the past it was suggested on this subreddit to only mine words with a frequency-rank smaller than 2 times the total number of words you currently know. So if you currently know 1500 words, you would only mine words with a frequency-rank smaller than 3000. This way you give your vocabulary “room to grow” while still prioritising higher frequency words. I cannot say that I am doing it *exactly* like this myself, but if you are looking for one, this sounds like a pretty good general rule to me.

    Edit: Here is a link to a version of the innocent corpus frequency dic that provides a rank instead of the total number of occurrences: [https://www.reddit.com/r/LearnJapanese/comments/nhc6bh/i_made_a_ranked_version_of_the_innocent_corpus/](https://www.reddit.com/r/LearnJapanese/comments/nhc6bh/i_made_a_ranked_version_of_the_innocent_corpus/)

  2. I don’t think I ever referred to the innocent corpus, but a reasonable measure is doing a search for a term if it’s not marked in a dictionary as “common” and subjectively sounds either dated or niche. Sometimes the combination of kanji used is a good indicator, too.

    Lastly, it depends on your interests and the material you’re interacting with. If it’s a specialized topic like military, science or magic in light novels, it’s likely these terms won’t appear outside of the topic, but might still be relevant to it.

  3. I do everything from under 15000 or if I for some reason believe a word is interesting.

Leave a Reply
You May Also Like