Optimization of Convolutional Neural Network Architectures for High-Accuracy Spoken Digit Classification Using Mel-Frequency Cepstral Coefficients

Authors

Keywords:

Convolutional Neural Network, Spoken Digits Recognition, MFCC, Delta MFCC, Global Max Pooling, Deep Learning

Abstract

Sound recognition is the ability of machine learning to identify spoken words. Different approaches have led to various attempts to implement automatic sound recognition systems. In recent years, convolutional neural networks (CNNs) have gained acceptance for performing various classification tasks in computer vision and voice assistants due to their capability of adaptation & learning and to overcome the accuracy issues in traditional machine learning methods. In this present paper, we are optimizing the convolutional neural network architecture and the number of parameters in the network with the improvement in accuracy for sound classification of spoken digits. We train the convolutional neural networks using the feature extraction of Mel-frequency cepstral coefficients (MFCCs). We examine the two different CNN models. In the first model, the three convolutional blocks with max-pooling layers are used, followed by a fully connected network with a classification layer. In the second model, global pooling is used after the convolutional blocks and passes the feature map directly to the classification layer. The experimental results from both networks are obtained and analyzed on the existing datasets of spoken digits. The analysis of the obtained results indicates that the taxonomic accuracy of the proposed optimized architectures of CNN surpasses the existing pre-trained CNN models with MFCC feature extraction for the classification of spoken digits.

Downloads

Published

2024-09-19

How to Cite

Pratibha Rashmi, Manu Pratap Singh, & Punj Prakash. (2024). Optimization of Convolutional Neural Network Architectures for High-Accuracy Spoken Digit Classification Using Mel-Frequency Cepstral Coefficients. Journal of Computational Analysis and Applications (JoCAAA), 33(05), 649–668. Retrieved from http://eudoxuspress.com/index.php/pub/article/view/591

Issue

Section

Articles

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.