|PROJECTS AT VISL FINISHED IN 2005|
Speech Recognition project was developed as a specified tool
handling computers applications.
Speech Recognition fulfills various aspects in our lives that we cannot imagine, Such as applications for the impaired hearing, voice dialing and voice to text (V2T) applications.
It seems that not only speech recognition is a luxury, but a need as well. Our goal is developing a fully integrated I/O system from this basic toolThe application should recognize 12 words concerning voice dialing; the digits 0-9 and the words : "send" and "stop" (to begin and end a call).
This systems is based on GUI (graphical user interface), which controls the basic features.
The problem (background):
Speech Recognition through learning machine faces to the
One may think to himself how can we recognize\classify sound
signals and moreover, how can we build a learning machine?
Learning machine is a strong method that can be adjusted to any
problem that you may think of.
Sound signals recognition\classification is being adjusted to the learning machine , by "cleaning" the signal and pass it through probabilistic model.
The solution (basic approach):
a. Getting signals and processing them in the time domain:
calculating the signal's envelope, and truncating the signal, to lose noise samples.
The word "Four" : initial
The word "Four" : envelope
The word "Four" : after truncation
b. Transfer them to the frequency domain and perform LPF on the FFT, in order to lose the frequencies where noise shall appear.
Modeling the signal into an observation vector.
There are some known methods to create that vector, the
methods used in this project are :
* The BINS method. windowing the FFT into BINS, where in each,
mean and standard deviation are calculated.
We got 20 coefficients from the LPC algorithm and 20
coefficients from the BINS method (10 BINS).
Passing the vector into the learning machine.
in this Project the SVM algorithm (Support Vector Machine) was used to create the learning machine.
SVM tries to divide the space (of order N) into classes, with
maximal margins between classes.
2 class example .Linearly divided.
More over, sometimes a non-linear classifier is needed:
A case in which a non-linear classifier is needed.
We tried three classifier kernels : Linear, Polynomial, Radial (RBF).
The three classifiers where checked on one speaker and two speakers, on
Through the work over the project we came to a few conclusions:
a. Despite of its theoretical poorness, the Linear
classifier performed better
b. The project success of learning system is affected by :Dictionary size : At first we had a dictionary size of 100 words.
This proved to be very hard to implement, so it was lowered to 12 words.
Samples: as more samples of the words
were given, the classification got
Quality of record: the better the
environment is, the better the
Number of speakers: The system
supports one or two classifiers. The more
Coefficients handling: deriving the
coefficients form the signal is the most
c. It is important to compare other methods of creating
d. For real time system, the recording environment should be as
e. The dictionary should be expanded to support more
We are grateful to our project instructor Dori Peleg for his help and guidance throughout this work, and Lab Supervisor Johanan Erez.