Sphinx installation

Posted on

A friend wanted to use CMU Sphinx with some students but had some issues to make it work. We used the French database1 as our English accents weren’t good enough for the recognition. In order to use it, you you would need to respect several steps.

  • install CMU Sphinx,
  • install Alsa (and possibly Jack)
  • a dictionnary (a French one in my case)
  • a microphone (with the appropriate configuration)

Package installation or compilation

There are two possible installation processes:

  • Sphinx: the standalone application written in java
  • PocketSphinx: the C library used mainly in embedded systems

The starting idea was to use it on a Raspberry PI, so I decided to install PocketSphinx2. If you use any mainstream GNU/Linux distribution, you should find the required packages and all dependencies should manage accordingly. If like me, you’re using Slackware, you will probably have some difficulties to find appropriate Slackbuilds (you can find old ones on my repository).

Dependencies

If in any case, you need to compile everything by yourself, you’ll need to satisfy the following dependencies:

  • Bison
  • Alsa
  • sphinxbase
  • sphinxtrain

If you need to perform the compilation by yourself, I invite you to read carefully the INSTALL file in each library tarball.

Audio path configuration

Once the installation is done, you need now to configure correctly the audio path. In my case, I used the Jack Audio server in order to have real time privileges3. If you have some trouble to give these rights to Jack, you can refer to this (very light) tutorial. Once Jack is configured, you can run jackd:

jackd -R -d alsa -d hw:0,0 -r 44100

If you don’t know the ID of your sound card (corresponding to the hw:<ID> in the command given above, you can use the aplay -l command). The option -r corresponds to the sample rate.

From this point, in theory you are ready to use Pockersphinx library. Nevertheless, the recognition algorithm is really sensible to non linearities in the signal. It means, that you should properly configure your microphone input in order to avoid any saturation. You can either record a portion of signal and analyse it using Matlab/Octave or Python, or a bit easier (less accurate for sure but enough in this case), you can loop the microphone signal to your headphone and assess “by ear” any saturation that could occur. To perform this operation, you can use jack. First, list the I/O’s of your system using the command jack_lsp. It should give something close to that:

# jack_lsp output
system:capture_1
system:capture_2
system:playback_1
system:playback_2

If we take the example that you microphone is plugged in of the input 1 and headphones on the output 1. We ask jack to connect the I/Os as following:

jack_connect system:capture_1 system:playback_1

If you prefer to have the signal on both ears, do the same operation as described above and connect to playback_2. Now you can adapt the level using the command alsamixer. Once you are inside the mixer, everythin related to the inputs can be accessed by pressing the key “F4”.

You can now really use Pocketsphinx. By default, the system is using an English dictionnary. In our case, we are interested in French, so we are going to download the right dictionnaries:

You need to extract the different archives. The DMP file is a bz2 archive and you should use the following command to extract its content:

bz2 -d file.lm.dmp.bz2

The other two archives are classical tarballs:

tar xvf file.tar.gz

Now you are completely ready to play with Pocketsphinx by giving him as arguments the dictionnaries and acoustic model:

pocketsphinx_continuous -hmm acoutic_model_folder/ -dict frenchWords.dic -lm french.lm.dmp

You should have many information displayed in the console but the essential keyword being “READY…”. If you see it, you are ready to speak and enjoy the speech recognition. On average, we can expect a correlation of 70%. The next step would be to connect the speech recognition engine to something else to control anything you want.


  1. The database can be generic or specific. It will be dependent of the use made during production (for example, a hotline). [return]
  2. Warning, the installation has been performed on the x86 platform and not on a Raspberry PI, few changes may be required. [return]
  3. Usually, this type of application use need low latency buffer and have specific privileges are in order. Nevertheless, the use of the RT kernel isn’t mandatory. [return]