Use Open Source Speech Recognition

To start, where would you ideally run a quick and dirty Speech Recognition project? Likely, the best place in a Windows 10 (and this could apply for a Mac as well) is in an Anaconda environment. Assuming this to be your case, I will proceed since some complications are avoided by how Anaconda interacts with certain C++ dependencies.

Pocketsphinx

We will use CMU Sphinx, but I am aware of the CMU’s sphinx’ development team pivoting to VOSK. For now, let’s work through the older problems associated with CMU Sphinx since many fundamental points can be addressed through that platform.

If on Windows 10, then you first must install Visual C++ Build Tools, which will take some time. However, somewhat independent of this event, you could also run the following commands to get started:

 pip install SpeechRecognition

Once you try calling out to Speech Recognition, a CMU sphinx set of python bindings via a pip installation, you may receive an error because the C++ build tools are not visible to Anaconda’s python environment. If so, wait and once you can build Speech Recognition (e.g. install means ‘build’ because you literally call out other relevant dependencies).

Next, you may also need to call Port Audio. This will definitely need Visual Studio installed.

pip install PortAudio

Finally, make sure to call Swig because this will permit you build the relevant pocketsphinx dependencies (and pocketsphinx itself) as if you were in a kind of linux environment. Conda install helps fix a lot of discrepancies associated with Windows.

conda install swig

pip install pocketsphinx