Speech processing
Yaser Shekofteh; Farshad Almasganj
Volume 6, Issue 1 , June 2012, , Pages 17-33
Abstract
Recent researches show that nonlinear and chaotic behavior of the speech signal can be studied in the reconstructed phase space (RPS). Delay embedding theorem is a useful tool to study embedded speech trajectories in the RPS. Characteristics of the speech trajectories have rarely used in the practical ...
Read More
Recent researches show that nonlinear and chaotic behavior of the speech signal can be studied in the reconstructed phase space (RPS). Delay embedding theorem is a useful tool to study embedded speech trajectories in the RPS. Characteristics of the speech trajectories have rarely used in the practical speech recognition systems. Therefore, in this paper, a new feature extraction (FE) method is proposed based on parameters of vector AR (VAR) analysis over the speech trajectories. In this method, using filter and reflection matrices obtained from applying VAR analysis on static and dynamic information of the speech trajectory in the RPS, a high-dimensional feature vector can be achieved. Then, different transformation methods are utilized to attain final feature vectors with appropriate dimension. Results of discrete and continuous phoneme recognition over FARSDAT speech corpus show that the efficiency of the proposed FE method is better than other time-domain-based FE methods such as LPC and LPREF.
Speech processing
Mohammad Reza Yazdchi; Seyed Ali Seyed Salehi
Volume 1, Issue 3 , June 2007, , Pages 201-213
Abstract
One of the most important challenges in automatic speech recognition is in the case of difference between the training and testing data. To decrease this difference, the conventional methods try to enhance the speech or use the statistical model adaptation. Training the model in different situations ...
Read More
One of the most important challenges in automatic speech recognition is in the case of difference between the training and testing data. To decrease this difference, the conventional methods try to enhance the speech or use the statistical model adaptation. Training the model in different situations is another example of these methods. The success rate in these methods compared to those of cognitive and recognition systems of human beings seems too much primary. In this paper, an inspiration from human beings' recognition system helped us in developing and implementing a new connectionist lexical model. Integration of imputation and classification in a single NN for ASR with missing data was investigated. This can be considered as a variant of multi-task learning because we train the imputation and classification tasks in parallel fashion. Cascading of this model and the acoustic model corrects the sequence of the mined phonemes from the acoustic model to the desirable sequence. This approach was implemented on 400 isolated words of TFARSDAT Database (Actual telephone database). In the best case, the phoneme recognition correction increased in 16.9 percent. Incorporating prior knowledge (high level knowledge) in acoustic-phonetic information (lower level) can improve the recognition. By cascading the lexical model and the acoustic model, the feature parameters were corrected based on the inversion techniques in the neural networks. Speech enhancement by this method had a remarkable effect in the mismatch between the training and testing data. Efficiency of the lexical model and speech enhancement was observed by improving the phonemes' recognition correction in 18 percent compared to the acoustic model.