Feature Extraction based on Linear Modeling of Embedded Speech Trajectory in the Reconstructed Phase Space for Speech Recognition System

Shekofteh, Yaser; Almasganj, Farshad

doi:10.22041/ijbme.2012.13096

Document Type : Full Research Paper

Authors

Yaser Shekofteh ¹
Farshad Almasganj ²

¹ Ph.D Candidate, Bioelectric Department, Faculty of Biomedical Engineering, Amirkabir University of Technology

² Associate Professor, Bioelectric Department, Faculty of Biomedical Engineering, Amirkabir University of Technology

10.22041/ijbme.2012.13096

Abstract

Recent researches show that nonlinear and chaotic behavior of the speech signal can be studied in the reconstructed phase space (RPS). Delay embedding theorem is a useful tool to study embedded speech trajectories in the RPS. Characteristics of the speech trajectories have rarely used in the practical speech recognition systems. Therefore, in this paper, a new feature extraction (FE) method is proposed based on parameters of vector AR (VAR) analysis over the speech trajectories. In this method, using filter and reflection matrices obtained from applying VAR analysis on static and dynamic information of the speech trajectory in the RPS, a high-dimensional feature vector can be achieved. Then, different transformation methods are utilized to attain final feature vectors with appropriate dimension. Results of discrete and continuous phoneme recognition over FARSDAT speech corpus show that the efficiency of the proposed FE method is better than other time-domain-based FE methods such as LPC and LPREF.

Keywords

Main Subjects

Speech processing

References

[1] Awrejcewicz J., Bifurcation portrait of the human vocal cord oscillation; Journal of Sound Vibrations, 1990; 136: 151–156.

[2] Berry, D.A., Herzel, H., Titze, I.R., Krischer K., Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions; The Journal of the Acoustical Society of America, 1994; 95: 3595–3604.

[3] Herzel, H., Berry, D., Titze, I., Steinecke, I., Nonlinear dynamics of the voice: signal analysis and biomechanical modeling; Chaos, 1995; 5: 30–34.

[4] Jiang, J.J., Zhang, Y., Chaotic vibration induced by turbulent noise in a two-mass model of vocal folds; The Journal of the Acoustical Society of America,2002; 112: 2127–2133.

[5] Jiang, J.J., Zhang, Y., McGilligan, C., Chaos in voice, from modeling to measurement; Journal of Voice, 2006; 20(1): 2006; 2-17.

[6] Kokkinos, I., Maragos, P., Nonlinear speech analysis using models for chaotic systems; IEEE Trans. Speech Audio Processing, 2005; 13: 1098–1109.

[7] Hagmuller, M., Kubin, G., Poincare pitch marks. Speech Communication; 2006; 48: 1650–1665.

[8] Sun, J., Zheng, N., Wang, X., Enhancement of Chinese speech based on nonlinear dynamics; Signal Processing, 2007; 87: 2431–2445.

[9] Kantz, H., Schreiber, T., Nonlinear Time Series Analysis Cambridge University Press, Cambridge, England. 1997.

[10] Takens, F., Detecting strange attractors in turbulence; In Proc. Dynamical System Turbulence, 1980; pp. 366–381.

[11] Narayanan, S.S., Alwan, A.A., A nonlinear dynamical systems analysis of fricative consonants; Acoustical Society of America Journal, 1995; 97: 2511-2524.

[12] Shekofteh, Y., Almasganj, F., Using phase space based processing to extract proper features for ASR systems; In Proc. 5th International Symposium on Telecommunications (IST), 2010; pp. 596-599.

[13] Vaziri, G., Almasganj, F., Behroozmand, R., Pathological assessment of patients’ speech signals using nonlinear dynamical analysis; Computers in Biology and Medicine, 2010; 40(1): 54-63.

[14] Paliwal, K., Alsteris, L., On the usefulness of STFT phase spectrum in human listening tests; Speech Communication, 2005; 45: 153–170.

[15] Hegde, R. M., Murthy, H.A., Gadde, V.R.R., Significance of the modified group delay feature in speech recognition; IEEE Trans. Audio, Speech and Language Processing, 2007; 15(1): 190–202.

[16] Alsteris, L.D., Paliwal, K.K., Short-time phase spectrum in speech processing: A review and some experimental results; Digital Signal Processing, 2007; 17: 578–616.

[17] Pitsikalis, V., Maragos, P., Speech analysis and feature extraction using chaotic models. In Proc. ICASSP, Orlando, Florida, 2002; pp. 533-536.

[18] Pitsikalis, V., Maragos, P., Filtered dynamics and fractal dimensions for noisy speech recognition; Signal Processing Letters, 2006; 13(11): 711-714.

[19] Pitsikalis, V., Maragos, P., Analysis and classification of speech signals by generalized fractal dimension features; Speech Communication, 2009; 51(12): 1206-1223.

[20] Ezeiza, A., Ipina, K.L., Hernández, C., Barroso, N., Enhancing the feature extraction process for automatic speech recognition with fractal dimensions; Cognitive Computation, 2012; pp. 1-6.

[21] Yu, S., Zheng, D., Feng, X., A new time domain feature parameter for phoneme classification. In Proc. WESPAC IX 2006, Seoul, Korea. 2006.

[22] Narayanan, N.K., Thasleema, T.M., Prajith, P., Reconstructed state space model for recognition of consonant - vowel utterances using support vector machines; International Journal of Artificial Intelligence and Applications, 2012; 3(2): 101-119.

[23] Thasleema, T.M., Prajith, P., Narayanan, N.K., Time–domain non-linear feature parameter for consonant classification; International Journal of Speech Technology, 2012; 15(2): 227-239.

[24] Ye, J., Povinelli, R.J., Johnson, M.T., Phoneme classification using naive Bayes classifier in reconstructed phase space; In Proc. IEEE Digital Signal Processing Workshop, Atlanta, Georgia. 2002.

[25] Ye, J., Johnson, M.T. M.T., Povinelli, R.J., Phoneme classification over reconstructed phase space using principal component analysis; In Proc. NOLISP, Le Croisic, France, 2003; pp. 11–16.

[26] Povinelli, R.J., Johnson, M.T., Lindgren, A.C., Ye, J., Time series classification using Gaussian mixture models of reconstructed phase spaces; IEEE Trans. Knowledge and Data Engineering, 2004; 16:779–783.

[27] Povinelli, R.J., Johnson, M.T., Lindgren, A.C., Roberts, F.M., Ye, J., Statistical models of reconstructed phase spaces for signal classification; IEEE Trans. Signal Processing, 2006; 54: 2178–2186.

[28] Jafari, A., Almasganj, F., NabiBidhendi, M., Statistical modeling of speech Poincaré sections in combination of frequency analysis to improve speech recognition performance; Chaos, 2010; 20(033106):1-11.

[29] Jafari, A., Almasganj, F., Using nonlinear modeling of reconstructed phase space and frequency domain analysis to improve automatic speech recognition performance; International Journal of Bifurcation and Chaos, 2012; 22(3).

[30] Shekofteh, Y., Almasganj, F., Feature extraction based on speech attractors in the reconstructed phase space for automatic speech recognition systems; ETRI Journal, 2013; 35(1): 100-108.

[31] Sauer, T., Yorke, J.A., Casdagli, M., Embedology; Journal of Statistical Physics, 1991; 65: 579–616.

[32] Kennel, M.B., Brown, R., Abarbanel, H.D.I., Determining embedding dimension for phase-space reconstruction using a geometrical construction; Physical review A, 1992; 45(6): 3403–3411.

[33] Abarbanel, H.D.I., Analysis of observed chaotic data; Springer, New York. 1996.

[34] Johnson, M.T., Povinelli, R.J., Lindgren, A.C., Ye, J., Liu, X., Indrebo, K.M., Time-domain isolated phoneme classification using reconstructed phase spaces; IEEE Trans. Speech Audio Processing, 2005; 13(4): 458–466.

[35] Banbrook, M., McLaughlin, S., Dynamical modelling of vowel sounds as a synthesis tool; In Proc. ICSLP, 1996; pp. 1981-1984.

[36] Indrebo, K.M., Povinelli, R.J., Johnson, M.T., Sub-banded reconstructed phase spaces for speech recognition; Speech Communication, 2006; 48: 760-774.

[37] Rabiner, L.R., Schafer, R.W., Digital processing of speech signals (vol. 19). New York: Prentice-hall. 1979.

[38] Markel, J.E., Gray, A.H., Linear prediction of speech. Springer-Verlag New York. 1982.

[39] Ramachandran, R.P., Zilovic, M.S., Mammone, R.J., A comparative study of robust linear predictive analysis methods with applications to speaker identification; IEEE Trans. Speech and Audio Processing, 1995; 3(2): 117-125.

[40] Huang, X., Acero, A., Hon, H.W., Reddy, R., Spoken Language Processing: A Guide to Theory, Algorithm & System Development. 2001.

[41] Atal, B.S., Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification; The Journal of the Acoustical Society of America, 1974; 55, 1304.

[42] Ramamoorthy, V., Jayant, N.S., Cox, R.V., Sondhi, M.M., Enhancement of ADPCM speech coding with backward-adaptive algorithms for postfiltering and noise feedback; IEEE Journal on Selected Areas in Communications, 1988; 6(2): 364-382.

[43] Lee, K.F., Hon, H.W., Reddy, R., An overview of the SPHINX speech recognition system; IEEE Trans. Acoustics, Speech and Signal Processing, 1990; 38(1): 35-45.

[44] Young, S. J., Evermann, G., Gales, M.J.F., Kershaw, D., Moore, G., Odell, J.J., Woodland, P.C., The HTK book (version 3.4). 2006.

[45] Kamiński, M., Determination of transmission patterns in multichannel data; Philosophical Transactions of the Royal Society B: Biological Sciences, 2005; 360(1457): 947-952.

[46] Stock, J.H., Watson, M.W., Vector autoregressions. The Journal of Economic Perspectives, 2001; 15(4): 101-115.

[47] Schlogl, A., A comparison of multivariate autoregressive estimators; Signal Processing, 2006; 86(9): 2426-2429.

[48] Hytti, H., Takalo, R., Ihalainen, H., Tutorial on multivariate autoregressive modeling; Journal of clinical monitoring and computing, 2006; 20(2): 101-108.

[49] Marple, S.L., Digital spectral analysis with applications; Englewood Cliffs, NJ, Prentice-Hall. 1987.

[50] Lindgren, A.C., Johnson, M.T., Povinelli, R.J., Joint frequency domain and reconstructed phase space features for speech recognition; In Proc. ICASSP, Montreal, Canada, 2004; pp. I-533–I-536.

[51] Shekofteh, Y., Almasganj, F., Goodarzi, M.M., Comparison of linear based feature transformations to improve speech recognition performance; In Proc. 19th Iranian Conference on Electrical Engineering (ICEE), pp. 2011; 1-4.

[52] Cai, D., He, X., Han, J., Zhang, H.J., Orthogonal laplacianfaces for face recognition; IEEE Trans. Image Processing, 2006; 15(11): 3608-3614.

[53] FARSDAT, Persian speech database: <http://catalog.elra.info/product_info.php?products_id=18>.

[54] Bijankhan, M., Sheykhzadegan, J., Roohani, M.R., Zarrintare, R., Ghasemi, S.Z., Ghasedi, M.E., TFarsDat - The telephone farsi speech database; In Proc. EuroSpeech, Geneva, Switzerland, 2003; pp. 1525-1528.

[55] HTK, Hidden Markov Model Toolkit: <http://htk.eng.cam.ac.uk/>

[56] Shekofteh, Y., Almasganj, F., Using linear models of speech trajectory in the reconstructed phase space to extract useful features for speech recognition system; In Proc. Iranian Conf. Biomedical Engineering (ICBME), Tehran, Iran, 2012; pp.233–236

Iranian Journal of Biomedical Engineering

Feature Extraction based on Linear Modeling of Embedded Speech Trajectory in the Reconstructed Phase Space for Speech Recognition System

References

References

Volume 6, Issue 1 - Serial Number 1
June 2012
Pages 17-33

Feature Extraction based on Linear Modeling of Embedded Speech Trajectory in the Reconstructed Phase Space for Speech Recognition System

References

References

Volume 6, Issue 1 - Serial Number 1June 2012Pages 17-33

Volume 6, Issue 1 - Serial Number 1
June 2012
Pages 17-33