Document Type : Full Research Paper

Authors

1 Assistant Professor, Biomedical Engineering Department, Faculty of Engineering, University of Isfahan

2 Assistant Professor, Biomedical Engineering School, Amirkabir University of Technology

10.22041/ijbme.2007.13499

Abstract

One of the most important challenges in automatic speech recognition is in the case of difference between the training and testing data. To decrease this difference, the conventional methods try to enhance the speech or use the statistical model adaptation. Training the model in different situations is another example of these methods. The success rate in these methods compared to those of cognitive and recognition systems of human beings seems too much primary. In this paper, an inspiration from human beings' recognition system helped us in developing and implementing a new connectionist lexical model. Integration of imputation and classification in a single NN for ASR with missing data was investigated. This can be considered as a variant of multi-task learning because we train the imputation and classification tasks in parallel fashion. Cascading of this model and the acoustic model corrects the sequence of the mined phonemes from the acoustic model to the desirable sequence. This approach was implemented on 400 isolated words of TFARSDAT Database (Actual telephone database). In the best case, the phoneme recognition correction increased in 16.9 percent. Incorporating prior knowledge (high level knowledge) in acoustic-phonetic information (lower level) can improve the recognition. By cascading the lexical model and the acoustic model, the feature parameters were corrected based on the inversion techniques in the neural networks. Speech enhancement by this method had a remarkable effect in the mismatch between the training and testing data. Efficiency of the lexical model and speech enhancement was observed by improving the phonemes' recognition correction in 18 percent compared to the acoustic model.

Keywords

Main Subjects

[1]     Bimbot F., Chollet G., Paoloni A., Assessment methodology for speaker identification and verification systems: An overview of SAM-A Esprit project 6819 - Task 2500. ESCA Workshop on Automatic Speaker Recognition Identification and Verification 1994; 75- 82.
[2]     Lippmann R.P., Speech recognition by machines and humans; Speech Communication 1997; 22:1-15.
[3]     Gong, Y.; Speech recognition in noise environments: a survey; Speech Communication 1995; 16:261-291.
[4]     Miller G.A., Licklider J.C.R., The intelligibility of interrupted speech; Journal of the Acoustic Society of America 1950; 22:167-173.
[5]     Fletcher H., Speech and Hearing in Communication. Journal of the Acoustic Society of America 1953; 28:164-172.
[6]     Furui S., Recent advances in robust speech recognition; Speech Communication 1997; 22:27-39.
[7]     Lockwood P., Boudy J., Experiments with non-linear Spectral Subtractor (NSS), Hidden Markov Models and the projection, for robust speech recognition in cars; Speech Communication 1992; 11: 215-228.
[8]     Diamantaras K.I., Neural networks and principal component analysis, In: Handbook of neural network signal processing; CRC Press, 2002.
[9]     Cooke M., Morris A., Green P., Recognizing Occluded Speech; ESCA Tutorial and Workshop on the Auditory Basis of Speech Perception 1996, Keele University, 15-19.
[10] Jensen C.A., Reed R.D., Marks R.J., Inversion of Neural Networks: Algorithms and Applications; IEEE ,Neural Networks 1999; 87:1536-1549.
[11] Williams R.J., Inverting a Connectionist Network Mapping by Backpropagation of Error; Proc. 8th Annu. Conf. Cognitive Science.
[12] سیدصاحلی سیدعلی؛ افزایش کارایی بازشناخت الگوهای شبکه ‌های عصبی جلوسو از طریق توسعه روش‌هایی برای دوسویه کردن عملکرد آنها؛ گزارش طرح پژوهشی، دانشگاه صنعتی امیرکبیر، دانشکده مهندسی پزشکی، مهر 1383.
[13] Bijankhan M., Seikhzadeghan J., Roohani M.R., Samareh Y., Lucas K. Tebyani M., FARSDAT-the speech Database of Farsi Spoken Language; Proc. Of SST94 1994;826-831.
[14] ولی منصور، سیدصالحی سیدعلی؛ ارزیابی کارایی دو بازنمایی MFCC و LHCB در بازشناسی مقاوم به تنوعات گفتار مستقیم و تلفنی، مجموعه مقالات دهمین کنفرانس سالانه انجمن کامپیوتر ایران، 1383، 305-312.
[15] Nguyen D., Widrow B., Neural Networks for Selflearning Control Systems; IEEE Control Systems Magazine 1990; 10:18 23.
[16] Koerner E., Gewaltig M.O., Koerner U., Richter A., Rodemann U., A Model of Computation in Neocortical Architecture; Neural Networks Elsevier Science 1999;12: 989-1005.
[17] Koerner E., Tsujino H., Masutani T., A cortical type modular neural network for hypothetical reasoning; Neural Networks , Elsevier Science 1997; 10: 791-814.
[18] Koerner E., Matsumoto G., Cortical architecture and self-referential control for brain-like computation; Engineering in Medicine and biology Magazine, IEEE 2002; 21:121-133
[19] Wan E, Nelson AT; Networks for Speech Enhancement. In: Handbook of Neural Networks for Speech Processing, 1998:541-541.
[20] Ghosen J., Bengio Y., Bias Learning, Knowledge sharing; IEEE Trans. On Neural Networks 2003; 14:84-108.
[21] Mesulam M.M., From Sensation to Cognition; Brain, Oxford Univ. Press 1998, 121:1013- 1052.
[22] انصاری لیلا؛ مدلسازی اثرات هم‌تولیدی آواها در یک مدل شبکه عصبی بازشناخت گفتار؛ پایان نامه کارشناسی ارشد بیوالکتریک، دانشکده مهندسی پزشکی، دانشگاه صنعتی امیرکبیر، 13
[23] Saul L.K., Jordan M.I.; Attractor dynamics in feed forward neural networks; Neural Computation,Massachusetts Institute of Technology 2000; 12: 1313-1335.
[24] Trappenberg, T., Continuous attractor neural networks. In L. N. de Castro & F. J. V. Zuben (Eds.), Recent developments in biologically inspired computing. Hershey, PA: Idee Group; 2003.
[25] Wu Y., Pados D.A.; A feedforward bidirectional associative memory; IEEE Trans. On Neural Networks 2000;11:42.82.