Convert Video/Audio to Text with Mozilla DeepSpeech

Prepare the audio file

!pip3 install ffmpeg# convert video to audio
!ffmpeg -i dogs.mp4 -b:a 64K -ar 16000 -ac 1 -vn dogs_audio.wav
# if you have audio but not the right fomat
!ffmpeg -i dogs.wav -vn -ar 16000 -ac 1 dogs_audio.wav

Install DeepSpeech & download pre-trained model

!pip3 install deepspeech!curl -LO!curl -LO

Extract text

from deepspeech import Model 
import wave
import numpy as np
model = Model('./deepspeech-0.9.3-models.pbmm')
fin =, 'rb')
audio = np.frombuffer(fin.readframes(fin.getnframes()), np.int16)
# Perform inference
infered_text = model.stt(audio)



