Create Translated Subtitles(Video to srt File) with Google Speech-to-text, Cloud Translation, from Japanese to English
For the last few days, I studied several ways to transcribe video and found out that AI services like Google Speech-to-text is still the most effective way.
That is because open-source libraries like SpeechRecognition is just the wrapper of other services including Google Speech-to-text, it does not allow transcription for long audio.https://reneelin2019.medium.com/convert-extract-speech-from-video-to-text-with-python-d8ebf4ad9734
The real open-source libraries such as Mozilla DeepSpeech is difficult to use for beginners like me, besides, they are not as accurate as AI services. I just tried the pre-trained English model, more time is required to setup other languages or even training custom dataset.https://reneelin2019.medium.com/video-audio-to-text-with-mozilla-deepspeech-2f7c3b3aef1f
With Google’s service, now I can directly create subtitle in other languages. The code is refer to their GitHub official tutorial files and Youtube tutorial https://www.youtube.com/watch?v=uBzp5xGSZ6o
community/tutorials/speech2srt at master · GoogleCloudPlatform/community
This repository holds the content submitted to https://cloud.google.com/community. Files added to the tutorials/…
which use Speech-to-text to extract audio to srt file, and then use Translation API to translate the text. My complete code is here https://github.com/reneelin1712/autoTranslation/tree/main/translatedSub
Although I tried to translate English to Japanese in the code, the opposite is the same. Finally, I can translate cutscenes in FFXIV from Japanese to other languages automatically, the language code for Japanese is “ja” instead of “jp” by the way.
I noticed accessing the service through Colab is not stable, sometimes it says “403 Cloud Speech-to-Text API has not been used in project xxx before or it is disabled.” , but I have enabled it and sometimes it works…Therefore, I change back to local environment with Python directly.
After creating virtual environment, only those dependencies are needed: google-cloud-speech, google-cloud-storage, google-cloud-translat and srt. and a json credential file is downloaded for use
There are several parameters we can set for custom use, it is about the file format and location. It’s worth noticing that the audio file needs to be saved in the Google cloud storage.
# parameters about the audio file
sample_rate_hertz = 44100
language_code = "en-US"
audio_channel_count = 2
encoding = 'LINEAR16'
out_file = "subtitle"
max_chars = 40
storage_uri = 'gs://' + 'autotrans' + '/' +'ok.wav'
Then the core is to call the API to get the response, after this, there is utility functions to help breaking text and save as txt or srt file. The complete file is the ‘autosub.py’
# Encoding of audio data sent.client = speech.SpeechClient()operation = client.long_running_recognize(
response = operation.result()
Translate text file
After getting the text file we can call translation API to translate the text, and only txt file/tsv file is accepted, this is why we need another step to convert txt to srt next. We would call the function ‘batch_translate_text’:
operation = client.batch_translate_text(
Then we will have two files a txt file in target language and a index.csv file
Align text file with original srt file to output srt file in another language
The last step is to convert txt file from last step to srt file, the complete code is in ‘txt2srt.py’
def process_translations(subs, indexfile):
# read index.csv and foreach translated file, print("Updating subtitles for each translated language")
with open(indexfile) as f:
lines = f.readlines()
# copy orig subs list and replace content for each line
for line in lines:
index_list = line.split(",")
lang = index_list
langfile = index_list.split("/")[-1]
# langfile = '/'+langfile
lang_subs = update_srt(lang, langfile, subs)
returndef update_srt(lang, langfile, subs):
# change subtitles' content to translated lines with open(langfile) as f:
lines = f.readlines()
i = 0
for line in lines:
subs[i].content = line
i += 1
With those services a lot of transcription and translation work can be saved, now I only need to import srt files to Adobe Pr for further adjustment.