Home > Published Issues > 2024 > Volume 15, No. 7, 2024 >
JAIT 2024 Vol.15(7): 853-861
doi: 10.12720/jait.15.7.853-861

DHERF: A Deep Learning Ensemble Feature Extraction Framework for Emotion Recognition Using Enhanced-CNN

Shaik Abdul Khalandar Basha and P. M. Durai Raj Vincent *
School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India
Email: khalandar.basha2016@vitstudent.ac.in (S.A.K.B.); pmvincent@vit.ac.in (P.M.D.R.V.)
*Corresponding author

Manuscript received January 3, 2024; revised January 17, 2024; accepted February 21, 2024; published July 18, 2024.

Abstract—Artificial Intelligence (AI) based solutions are inevitable for real-time issues in any field where voluminous historical data is to be analyzed for accurate prediction analysis. Voice-operated smart AI devices like Alexa, Siri, etc., are a commercial success which are now part of most smart households. Voice-based acoustic datasets can also be leveraged to function like biomarkers in identifying the emotion of the speech signal. Existing deep learning models using Convolutional Neural Networks (CNN) have already been employed for emotion detection, but mediocre performance was reported when prediction was extracted from multimedia content analysis. To enhance the performance of CNN-based deep learning algorithms on multi-media content-based datasets, a novel configuration framework known as the Deep Human Emotion Recognition Framework (DHERF) has been proposed in this work. DHERF exploits multiple selective features from the training dataset with a learning-based phenomenon for enhancing prediction accuracy. The experimental study revealed that optimized feature selection in training the DHERF model resulted in better prediction performances of up to 85.70% accuracy as compared to conventional CNN baseline and Long Short-Term Memory (LSTM) models which attained a maximum prediction accuracy of 71.64% and 81.11% respectively, for the same experimental conditions.
 
Keywords—deep learning, human emotion recognition, Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), baseline CNN, audio-based human emotion recognition

Cite: Shaik Abdul Khalandar Basha and P. M. Durai Raj Vincent, "DHERF: A Deep Learning Ensemble Feature Extraction Framework for Emotion Recognition Using Enhanced-CNN," Journal of Advances in Information Technology, Vol. 15, No. 7, pp. 853-861, 2024.

Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.