Convert Video/Audio to Text with Mozilla DeepSpeech

Prepare the audio file

!pip3 install ffmpeg# convert video to audio
!ffmpeg -i dogs.mp4 -b:a 64K -ar 16000 -ac 1 -vn dogs_audio.wav
# if you have audio but not the right fomat
!ffmpeg -i dogs.wav -vn -ar 16000 -ac 1 dogs_audio.wav

Install DeepSpeech & download pre-trained model

!pip3 install deepspeech!curl -LO!curl -LO

Extract text

from deepspeech import Model 
import wave
import numpy as np
model = Model('./deepspeech-0.9.3-models.pbmm')
fin =, 'rb')
audio = np.frombuffer(fin.readframes(fin.getnframes()), np.int16)
# Perform inference
infered_text = model.stt(audio)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Renee LIN

Renee LIN

Passionate about web dev and data analysis. Huge FFXIV fan.