Home Learn Japanese Using Whisper to transcribe audio

Using Whisper to transcribe audio

June 7, 2023
3 comments

Whisper (the new transcription API from OpenAI) seems like a good option to transcribe audio. For example, here I’m using it to transcribe the audio from the Japanese dub of For All Mankind (dub subtitles often don’t match the audio). There’s no sound of video of the show because of DRM but you should get the idea. It’s useful for going back and checking a word you didn’t understand, for example. I’m using [this implementation](https://github.com/ggerganov/whisper.cpp) of Whisper.

https://imgur.com/a/567o5JP

3 comments

ExtramaritalGaming says:
June 7, 2023 at 7:10 pm
Whisper has been out for quite a while at this point, so unless they’ve made some sort of stealth update to their models their largest model is still only 85~% accurate for Japanese. If your listening isn’t good enough to not need subtitles to pick up on that 15%, you probably shouldn’t be using this, if your listening is good enough to pick up on that 15% you should probably just be watching without subtitles. This is honestly pretty niche, I used it myself for a couple shows and then never used it again because it just wasn’t worth the extra effort
AlbaNemori says:
June 7, 2023 at 7:23 pm
I’ve been doing this for months and it’s awesome to generate subtitles for things that don’t don’t have any! Only difference is I use Whisper + Whisper-WebUI to generate a .srt file (I don’t own the necessary hardware to transcribe it live) then load it into a player like Memento which allows you to hover over words and easily create Anki cards. Only thing that sucks it that it’s kinda inaccurate with the sub timings (and sometimes the wrongs words) but it’s vastly better than YouTube auto-generated subs.
TraditionalEase says:
June 8, 2023 at 1:48 am
I’ve done a few tests with this thing when it first came out, and had really mixed results. At least at the time any sort of speech at native speed with more than one speaker talking “naturally” (e.g. sometimes talking over each other, interrupting, bouncing off what the other person just said) seemed to really confuse it, and it would just start outputting widely incorrect things after a while.
It might be better for professionally recorded audio, but I’d still only use it to generate the “rough draft” subtitle to go over manually and fix transcription issues rather than something to learn off of.

You must be logged in to post a comment.

— Previous article

tattoos on JET

Next article —

Using Whisper to transcribe audio

Tags:

3 comments

Leave a Reply

tattoos on JET

Just wanted a 2nd pair of eyes on 1st trip!

Random useful Japanese words when you come to Japan

What books would you recommend for my reading level?

Weekly Thread: Study Buddy Tuesdays! Introduce yourself and find your study group! (May 07, 2024)

Using Whisper to transcribe audio

Tags:

3 comments

Leave a Reply

tattoos on JET

Just wanted a 2nd pair of eyes on 1st trip!

You May Also Like