Design and Implementation of a Multi-Tier Deep Learning Framework for Robust Facial Emotion Recognition Using CNNs, Hybrid Boosting, and Vision Transformers
Keywords:
Facial Emotion Recognition (FER); Class Balancing; Convolutional Neural Networks (CNNs: VGG16, VGG19, ResNet50, InceptionV3, MobileNet); Boosting Algorithms (AdaBoost, Gradient Boosting, XGBoost); Vision Transformer (ViT); Hybrid Deep Learning Models; Fixed Hyperparameters; FANE DatasetAbstract
Facial Emotion Recognition (FER) faces challenges such as class imbalance, subtle variations, and limited model generalizability. This paper proposes a three-tier benchmark using the FANE dataset with nine emotion classes. We compare a rule-based Sequential model, five CNN architectures (VGG16, VGG19, ResNet50, InceptionV3, MobileNet), hybrid CNN + Boosting (AdaBoost, GB, XGBoost), and a custom Vision Transformer (ViT), all trained with fixed hyperparameters. Experiments on imbalanced and balanced datasets show that CNN + Boosting performs best post-balancing, while ViT benefits significantly from class balance. Results emphasize the value of standardization and architectural robustness in FER.


