Hybrid Speech Emotion Recognition System Using Machine Learning and Natural Language Processing
Keywords:
SER, Natural Language Processing, Machine Learning. RAVDESS, EMODB.Abstract
Speech Emotion Recognition (SER) is an essential area of research aimed at enabling machines to detect and interpret human emotions, thereby improving human-computer interaction. This study introduces a hybrid Speech Emotion Recognition system that combines machine learning techniques with Natural Language Processing (NLP) to enhance emotion detection accuracy and robustness. The system integrates acoustic and linguistic features, where acoustic features such as Mel-Frequency Cepstral Coefficients (MFCCs), pitch, and energy capture vocal expressions of emotions, while linguistic features derived from the textual content are analyzed using sentiment analysis, semantic embeddings, and syntactic patterns. By fusing these complementary feature sets, the proposed hybrid system overcomes the limitations of unimodal approaches and delivers a comprehensive analysis of emotional states. Machine learning algorithms such as Support Vector Machines (SVM), Random Forest, and Gradient Boosting are employed for classification, optimizing the system's ability to handle diverse emotional cues. The hybrid system was validated using benchmark SER datasets and outperformed traditional methods in terms of accuracy, particularly in noisy and cross-lingual scenarios. The results demonstrate the synergy of combining machine learning with NLP for recognizing emotions embedded in speech, highlighting its potential applications in domains such as virtual assistants, mental health monitoring, and customer service automation
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.