Create Translated Subtitles(Video to srt File) with Google Speech-to-text, Cloud Translation, from Japanese to English

Renee LIN
4 min readOct 27, 2021


For the last few days, I studied several ways to transcribe video and found out that AI services like Google Speech-to-text is still the most effective way.

That is because open-source libraries like SpeechRecognition is just the wrapper of other services including Google Speech-to-text, it does not allow transcription for long audio.

The real open-source libraries such as Mozilla DeepSpeech is difficult to use for beginners like me, besides, they are not as accurate as AI services. I just tried the pre-trained English model, more time is required to setup other languages or even training custom dataset.

With Google’s service, now I can directly create subtitle in other languages. The code is refer to their GitHub official tutorial files and Youtube tutorial

which use Speech-to-text to extract audio to srt file, and then use Translation API to translate the text. My complete code is here

Although I tried to translate English to Japanese in the code, the opposite is the same. Finally, I can translate cutscenes in FFXIV from Japanese to other languages automatically, the language code for Japanese is “ja” instead of “jp” by the way.

I noticed accessing the service through Colab is not stable, sometimes it says “403 Cloud Speech-to-Text API has not been used in project xxx before or it is disabled.” , but I have enabled it and sometimes it works…Therefore, I change back to local environment with Python directly.


After creating virtual environment, only those dependencies are needed: google-cloud-speech, google-cloud-storage, google-cloud-translat and srt. and a json credential file is downloaded for use



There are several parameters we can set for custom use, it is about the file format and location. It’s worth noticing that the audio file needs to be saved in the Google cloud storage.

# parameters about the audio file
sample_rate_hertz = 44100
language_code = "en-US"
audio_channel_count = 2
encoding = 'LINEAR16'
out_file = "subtitle"
max_chars = 40
storage_uri = 'gs://' + 'autotrans' + '/' +'ok.wav'

Then the core is to call the API to get the response, after this, there is utility functions to help breaking text and save as txt or srt file. The complete file is the ‘’

# Encoding of audio data sent.client = speech.SpeechClient()operation = client.long_running_recognize(
"enable_word_time_offsets": True,
"enable_automatic_punctuation": True,
"sample_rate_hertz": sample_rate_hertz,
"language_code": language_code,
"audio_channel_count": audio_channel_count,
"encoding": encoding,
audio={"uri": storage_uri},
response = operation.result()

Translate text file

After getting the text file we can call translation API to translate the text, and only txt file/tsv file is accepted, this is why we need another step to convert txt to srt next. We would call the function ‘batch_translate_text’:

operation = client.batch_translate_text(
"parent": parent,
"source_language_code": source_lang,
"target_language_codes": target_language_codes,
"input_configs": input_configs,
"output_config": output_config,

Then we will have two files a txt file in target language and a index.csv file

Align text file with original srt file to output srt file in another language

The last step is to convert txt file from last step to srt file, the complete code is in ‘’

def process_translations(subs, indexfile):
# read index.csv and foreach translated file,
print("Updating subtitles for each translated language")
with open(indexfile) as f:
lines = f.readlines()
# copy orig subs list and replace content for each line
for line in lines:
index_list = line.split(",")
lang = index_list[1]
langfile = index_list[2].split("/")[-1]
# langfile = '/'+langfile
lang_subs = update_srt(lang, langfile, subs)
write_srt(lang, lang_subs)
def update_srt(lang, langfile, subs):
# change subtitles' content to translated lines
with open(langfile) as f:
lines = f.readlines()
i = 0
for line in lines:
subs[i].content = line
i += 1
return subs

With those services a lot of transcription and translation work can be saved, now I only need to import srt files to Adobe Pr for further adjustment.



Renee LIN

Passionate about web dev and data analysis. Huge FFXIV fan. Join me on medium: