The Research on Kanji Frequency Databases

Hello, folks.

Some of you may be interested in my comparative study of several popular freely-accessible kanji frequency databases. The **usage frequency** is often the main factor of including or excluding the kanji from a textbook or other study material. However, many databases use only a **single source** (i.e., news) to determine the usage frequency of a character. Therefore, it is possible that these databases are less reliable and consistent compared to the databases that utilize **multiple sources**. Is this true? You will find the answer and other answers to many interesting questions in my [study](https://www.researchgate.net/publication/369366663_How_Reliable_and_Consistent_Kanji_Frequency_Databases_Really_Are) published last year in the second issue of the EJCJS.

Feel free to ask me any questions in regards to my research.

1 comment
  1. 1) how did you define consistency and reliability and why did you decide to define it this way? What are the advantages and limitations of defining it like this? Is there related research with a different methodology and similar or different conclusions?

    2) Is the sample size of 5 databases sufficient to make claims about consistency and reliability based on number of sources? How did you make sure that this is the case?

    3) How did you define “source” exactly? In your post you give “news” as an example of a single source, but there are very different kinds of news sources out there. How did you make sure that your definition of “source” is meaningful and not too arbitrary.

    4) Does your research draw any conclusions that would be useful for a Japanese learning community?

Leave a Reply
You May Also Like