What makes 歩 and 步 two unicode entries, while 市 in kanji and hanji being the same character?

I don’t know if this sub is the correct place to ask it.

What makes 歩 (U+6B69) and 步 (U+6B65) two unicode entries, while 市 (U+5E02) in kanji (vertical dot) and hanji (slanting dot) being the same character? After all there’s also 巿 (U+5DFF) which has a vertical dot in hanji.

​

|character|unicode|hanzi?|kanji?|
|:-|:-|:-|:-|
|[市](https://en.wiktionary.org/wiki/%E5%B7%BF)|[U+5E02](https://util.unicode.org/UnicodeJsps/character.jsp?a=5E02)|Yes|Yes|
|[巿](https://en.wiktionary.org/wiki/%E5%B8%82)|[U+5DFF](https://util.unicode.org/UnicodeJsps/character.jsp?a=5DFF)|Yes|?|
|[步](https://en.wiktionary.org/wiki/%E6%AD%A5)|[U+6B65](https://util.unicode.org/UnicodeJsps/character.jsp?a=6B65)|Yes|kyujitai|
|[歩](https://en.wiktionary.org/wiki/%E6%AD%A9)|[U+6B69](https://util.unicode.org/UnicodeJsps/character.jsp?a=6B69)|no|shinjitai|

​

by Kafatat

7 comments
  1. In chinese the character for sword has 12 variants in unicode. As why they are different between hanzi and kanji might have to do with the japanese kanji reform of 1946 and the later one for simplified. Some kanji are more similar to traditional characters (魚), some to simplified(学校), some are different from both or other characters are used to mean the same word

  2. If they are the same character but written slightly differently (e.g. slanting vs. vertical dots), this is compensated by choosing an appropriate font. That case does not need separate character entries.

  3. Unicode on Chinese characters isn’t very consistent, in part because it merged a couple of older separate standards and in part simply because that’s how it’s designed.

    There are for instance no separate entries in simplified Korean Chinese characters as there are for Japanese and Chinese ones simply because the original Korean encoding schemes used never had it.

  4. A goal of Unicode is to unify characters like these that are basically just variants. In practice, the level of unification is not as high as you might think. For example, Chinese simplified and traditional characters might be encoded separately even if they are the “same” character. Same with Japanese kyujutai and shinjitai.

    “Han unification” was pretty controversial at the beginning, so they were not super aggressive about it. There is also the “source separation rule”, which means that if a national character set encodes two different variants differently, then the distinction needs to be preserved to allow round trip conversion. Unicode is supposed to protect a superset of all the other character sets, not losing anything. There were also huge numbers of characters to look at, and potentially argue about, so sometimes it was just easier to make them separate.

Leave a Reply
You May Also Like