background: im done with tae kim grammar guide and 2k words into core10k anki (doing reviews everyday ofcourse) and i was wondering where would diminishing returns set in along the deck. is there a table/graph that shows usage percentage? something like “2k words = 50%, 6k = 70%, 10k = 80%” or something
2 comments
>something like “2k words = 50%, 6k = 70%, 10k = 80%” or something
General stats like this are misleading, because they assume that you learned words exactly in order of frequency of what you are interested in. Meaning if you learn the top 2,000 words taken from newspaper in the 90s (which is what the Core 2K is) it might give you something like ~80% coverage, but *only* for newspapers from the 90s. Your word coverage for say the typical anime will be a lot lower.
IMO a much better way to do this is via [jpdb.io](https://jpdb.io). It let’s you load exactly the media you are interested in + your Core 10K vocab deck and it will tell you exactly how much overlap there is and which words are most useful.
You can get them for specific novels, anime, etc. on jpdb.io, like https://jpdb.io/visual-novel/996/soukou-akki-muramasa/stats
Generally 2000 is a good place to stop
Can be slightly misleading since some words you’ll just know without lookups because its a combination of other words/kanji you know. Using the example above, I mined every word I didnt know or couldn’t guess in the vn but my deck didnt get close to 30k cards