On jpdb’s word counts

I’d never used jpdb’s srs or vocab lists, but I have used its word counts to compare difficulties.

I’ve been curious about its counting method, because I played the fate stay night vn and mined almost every word I didn’t know, and I’ve also mined from a lot of other stuff, but I’m only at 11k cards while jpdb says that fate stay night has 21K words. Curious about the ten thousand ghost words, I imported my anki decks and took a look at what it said I was missing.

A *TON* of it is stuff that I would consider as phrases, conjugations, collocations, or grammar . E.G:

* までもない
* なんです
* ですから
* どうせだから
* してやる
* でこぼこコンビ
* あるらしい
* こうする
* になると
* となれば
* となると
* からには
* それはそれで
* これぐらい
* だからといって
* いつのまに
* そこから
* バランスをとる
* メガネをかける

It likes to include the same word/phrase in different orthography, there were a lot of words in hiragana that I had already learned as kanji, and sometimes separate hiragana and katakana versions like:

* くせになる
* クセになる
* ハメになる
* はめになる

There were also a *TON* of katakana loan words that I felt were too obvious to bother with cards for.

* パトロール
* コマンド
* コントローラー

Additionally there were katakana phrases. Even if I felt the need to learn their component parts I personally wouldn’t learn these as combinations.

* パーフェクトワールド
* ラッキーナンバー

There are also a lot of slurs, contractions, and other distortions of words counted uniquely

* すっごい
* そりゃ
* おっきい
* あったかい
* しょうがねえ
* んまい

Somewhat overlapping with the katakana words were proper nouns. There were a bunch of place names and the names of all of the heroes.

There also were genuine words that I chose not to mine for whatever reason, and complete misparses.

I’m not necessarily saying this is a problem with jpdb, maybe it’s your style to want to learn all of them as individual words, and if so I don’t think there’s a problem with that. If I learned all of these with flashcards, I’d be able to do far more flashcards in a day, so probably it will even out in the end.

I’m mostly just saying, if you’re like me and only make flashcards for things that seem significant, don’t be too discouraged by jpdbs giant 21k “words” since their definition of “word” is pretty different. You don’t need 21k flashcards to be able to understand fate.

This part is just a theory, but I think this might also make longer works seem disproportionately difficult since they are more likely to use inconsistent orthography and more collocations, artificially inflating the word count.

6 comments
  1. Yeah the cool thing about JPDB to me is being able to see coverage on a work, and that does tend to include the easier words. So in that way it feels different from when I used Anki to mine. You can just mark them as “never forget” with jpdb so you don’t even need to review them, it’s really nice!

  2. The redundant cards are a thing, but a fair number of them are best learned separately as well because the Kanji usage is inconsistent in material. I feel as if slang terms are important to recognize and be able to read them. Sure you may have katakana loan words that you may think are easy, but パトロール was not one that made instant sense to me. In context, perhaps, but as an isolated card or encountering unexpectedly is one thing which I like about how JPDB works. Also – if its so obvious just hit ‘Easy’ and move on. It will only show up a few times and maybe in a month it will not be so obvious anymore.

  3. You are correct in that regard. However, it’s also worth mentioning that you won’t encounter every single learned word in each title. In fact, you’d only encounter a small portion, save for very frequent words, which balances this inflated number. In the end, you do need 21k flashcards to be able to understand Fate. I personally underestimated how crazy the number of words is in Japanese.

  4. > but I have used its word counts to compare difficulties […] but I’m only at 11k cards […] You don’t need 21k flashcards to be able to understand fate.

    You don’t, because you’re looking at the wrong metric, even *if* we assume the unique words are overcounted. (:

    You’re supposed to look at [these stats](https://jpdb.io/visual-novel/972/fate-stay-night/stats) to figure out how many words you need to comfortably read it. You can see that a 97%~98% coverage is somewhere between 11k and 13k unique words, which coincidentally matches with how many cards you have mined.

  5. Yea, I more or less agree. It’s very hard to count how many words we really need. Usually languages have many nuances, like look at expressions as “book full of meat” or “good delivery” (with a diction meaning). In my opinion these diverge a bit from a core meaning of each word. Such expressions aren’t the same as to learn completely unknown words, but at the same time I can’t say that we don’t need to spend any efforts to memorize it too. It’s just easier. Thus I personally split vocabulary on “word-meaning” pair and indirect information. There is a lot of indirect information related to each word, for example, apples are related to eat, peel, cut, but not read. Similarly it’s related to taste, appearance and other things. There can be also other meanings, either by itself (as a brand) or in combination with other words like “a bad apple spoils the barrel”. Personally I consider such indirect information as a middle-fluent stage and such things are easier to learn simply by using language extensively.

Leave a Reply
You May Also Like