I’m trying to **auto-generate and add** some kind of **furigana** to essentially plain text files, the output formatting doesn’t have to precisely conform to anything in particular, I’ll just post-process.
I’m Ok with text manipulation, working with html or with subtitle srs, vtt, ass etc files. I can take these files and add or remove tags as necessary.
Ultimately I want to take some vtt subtitle files in bulk that I downloaded from Netflix (there are chrome extensions etc that allow you to do this) and add furigana. If the furigana is added in brackets, or <ruby> tags /anything is fine, but the point is I have 100+ essentially plain(ish) text files and I want to add some kind of hiragana/ furigana information to supplement the kanji information. So working in bulk, scripting, command-line style processing would be ideal.
I know there are websites that allow kanji to furigana conversion, and browser extensions, and anki add-ons… these aren’t what I want because I’m not dealing with a single webpage or anki deck and I don’t want to copy and paste many many times to a website etc for online conversion… so that’s not quite what I want.
And I know the there are programming libraries, mostly I see recommendations of MECAB and variants/wrappers. Which are probably used at the back-end of the above websites and extensions.
… but is there actually any windows/Linux command line script or application where I can put a plain text file or similar (srt/ass/vtt subtitle) and get furigana output in any format?
I don’t mind a little coding or scripting but I don’t want to reinvent the wheel and code a lot to get multiple libraries working together in a complex way if I don’t need to for a problem that’s been essential solved many times before (for websites and anki extensions and browser extensions) if I don’t need to.
Any advice? For instance if MECAB is the best tool for the job (I’m not sure) is there any simple command line use of MECAB?
[edit: clarify auto-generate]
by bgaskin