Is there an app that split Japanese wall of text into words?

I used Windows 10 Dictation tool to convert speech to text in a Japanese video (ie, to make subtitles). Now, what I got is a wall of text without spaces or punctuation marks. I just wondered if there was an app that split this wall into coherent words.

2 comments
  1. From what I know, I don’t think there is a program that does that.

    Your best bet is to take the text and put it into [ichi.moe](https://ichi.moe) and start making your own spitted up subtitle

    **Japanese** usually does not have a **delimiter between words.**

    **If you know how to program Python**, you can use a word splitter library ([https://investigate.ai/text-analysis/splitting-words-in-east-asian-languages/](https://investigate.ai/text-analysis/splitting-words-in-east-asian-languages/))

  2. There are a lot of ways. For example, [ichi.moe](https://ichi.moe) has decent segmentation. You can also use something like Translation Aggregator with Jparser option on, and not only it will be split, but also colored like this:

    [https://user-images.githubusercontent.com/61393492/94332158-3b1ba200-ff98-11ea-9d26-4fc3e0fe7992.png](https://user-images.githubusercontent.com/61393492/94332158-3b1ba200-ff98-11ea-9d26-4fc3e0fe7992.png)

    Words are green, particles are purple and set phrases are combined. For example, you can see これから is a single block. But from my personal experience automatic segmentation makes mistakes and sometimes quite often. So you need to understand at least basic grammar to be able to see and fix that.

Leave a Reply
You May Also Like