Porting Hindi ASR onto Raspberry Pi 3

Academic Project for 6th Semester

Automatic speech recognition (ASR) systems use models which work at two stages, namely the parameterization of the input noise signal followed by training and testing of the features using any classification technique. Researchers have proposed a variety of acoustic models to accomplish this complex task. Deep learning is currently one of the most reliable and technologically capable approaches for creating more accurate speech recognition models and natural language processing (NLP). For low-resource languages, automatic speech recognition (ASR) systems have recently gained popularity. India has 22 official languages and over 2,000 regional languages, the majority of which have limited resources. The Hindi language's standard resources are also minimal. In our project, a Time Delay Neural Network (TDNN) was used to implement a continuous Hindi ASR system, which significantly improves the performance of Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) based Hindi ASR system using the Kaldi automatic speech recognition toolkit. Our project has utilised this system by deploying it on a raspberry pi using Vosk, an offline speech recognition toolkit, to create a portable device that can detect, receive and process voice audio data in real time.

Hardware and Software

Raspberry Pi 3
ReSpeaker 4-mic Array
Python3
Kaldi trained TDNN model (Hindi ASR)
Vosk-API

My Contribution

Training and testing the GMM-HMM and TDNN models using Kaldi speech recognition toolkit and Vosk API

Final Result

After compiling the model with kaldi, the training word error rate was found to be 45%. By executing the code with different input samples, we found the word error rate for test data to be in between 0 - 64%., with an average of 23% .This is accounting for the variations in test data and training data, such as variations in dialect, accent, vocabulary and enunciation.