Document Type : Full Research Paper
Author
Assistant Professor, Biomedical Engineering Group, School of Engineering Science, College of Engineering, University of Tehran, Tehran, Iran
Abstract
Two of the most prominent human emotions are arousal and valence. In this article, the aim is to answer the question whether predicting arousal and valence emotions arising from listening to music without using physiological signals and only using demographic and musical characteristics can provide appropriate results?. For this purpose, 48 30-second music with very high and very low levels of arousal and valence were selected from the DEAM music collection. Then, each of these music was separately labeled in terms of arousal and valence emotions by 175 Iranian participants with an age range of 14-35 years. These integer labels were from 1 (the lowest rate) to 5 (the highest rate). The root mean square energy, tempo, zero-crossing, spectral flatness, spectral centroid, spectral flux, spectral rolloff, rhythmic Complexity, and chromagram features were extracted from each music. The demographic features were age, gender, education level, economic level, ethnicity, zip code, and the hours of listening to music in each day. Observations related to label 3 (middle rate) were discarded due to the very low number of occurrences of this label compared to other labels, and 8051 observations were used for classification. The entire data was divided into 4 equal, nonoverlapping parts and classified 4 times so that each time one of the parts was used for testing and the rest parts were used for training the model. This process was repeated 10 times and the average results of the test data were calculated for the classification criteria. The arousal and valence emotions were analyzed separately. For classification performance comparison, five different classifiers including neural network, K nearest neighbors, support vector machine, decision tree, and random forest were taken into account. The neural network offered the best classification performance for arousal emotion by 77% accuracy, 90.3% specificity, 77% sensitivity and valence emotion by 79.7% accuracy, 91.2% specificity, 79.7% sensitivity. The results offer that the neural network can be an appropriate classifier for classification of the musical emotions of Iranian society using the music and demographic features.
Keywords
Subjects