A friend wanted to use CMU Sphinx with some students but had some issues to make it work. We used the French database1 as our English accents weren’t good enough for the recognition. In order to use it, you you would need to respect several steps.
- install CMU Sphinx,
- install Alsa (and possibly Jack)
- a dictionnary (a French one in my case)
- a microphone (with the appropriate configuration)
Package installation or compilation
There are two possible installation processes:
- Sphinx: the standalone application written in java
- PocketSphinx: the C library used mainly in embedded systems
The starting idea was to use it on a Raspberry PI, so I decided to install PocketSphinx2. If you use any mainstream GNU/Linux distribution, you should find the required packages and all dependencies should manage accordingly. If like me, you’re using Slackware, you will probably have some difficulties to find appropriate Slackbuilds (you can find old ones on my repository).
If in any case, you need to compile everything by yourself, you’ll need to satisfy the following dependencies:
If you need to perform the compilation by yourself, I invite you to read
INSTALL file in each library tarball.
Audio path configuration
Once the installation is done, you need now to configure correctly the audio
path. In my case, I used the Jack Audio server in order to have real time
privileges3. If you have some trouble to give these rights to Jack, you
can refer to this (very light) tutorial. Once Jack is configured, you
jackd -R -d alsa -d hw:0,0 -r 44100
If you don’t know the ID of your sound card (corresponding to the
in the command given above, you can use the
aplay -l command). The option
-r corresponds to the sample rate.
From this point, in theory you are ready to use Pockersphinx library.
Nevertheless, the recognition algorithm is really sensible to non linearities
in the signal. It means, that you should properly configure your microphone
input in order to avoid any saturation. You can either record a portion of
signal and analyse it using Matlab/Octave or Python, or a bit easier (less
accurate for sure but enough in this case), you can loop the microphone
signal to your headphone and assess “by ear” any saturation that could occur.
To perform this operation, you can use jack. First, list the I/O’s of your
system using the command
jack_lsp. It should give something close to that:
# jack_lsp output system:capture_1 system:capture_2 system:playback_1 system:playback_2
If we take the example that you microphone is plugged in of the input 1 and headphones on the output 1. We ask jack to connect the I/Os as following:
jack_connect system:capture_1 system:playback_1
If you prefer to have the signal on both ears, do the same operation as
described above and connect to
playback_2. Now you can adapt the level
using the command
alsamixer. Once you are inside the mixer, everythin
related to the inputs can be accessed by pressing the key “F4”.
You can now really use Pocketsphinx. By default, the system is using an English dictionnary. In our case, we are interested in French, so we are going to download the right dictionnaries:
You need to extract the different archives. The DMP file is a bz2 archive and you should use the following command to extract its content:
bz2 -d file.lm.dmp.bz2
The other two archives are classical tarballs:
tar xvf file.tar.gz
Now you are completely ready to play with Pocketsphinx by giving him as arguments the dictionnaries and acoustic model:
pocketsphinx_continuous -hmm acoutic_model_folder/ -dict frenchWords.dic -lm french.lm.dmp
You should have many information displayed in the console but the essential keyword being “READY…”. If you see it, you are ready to speak and enjoy the speech recognition. On average, we can expect a correlation of 70%. The next step would be to connect the speech recognition engine to something else to control anything you want.
- The database can be generic or specific. It will be dependent of the use made during production (for example, a hotline). [return]
- Warning, the installation has been performed on the x86 platform and not on a Raspberry PI, few changes may be required. [return]
- Usually, this type of application use need low latency buffer and have specific privileges are in order. Nevertheless, the use of the RT kernel isn’t mandatory. [return]