Home > Published Issues > 2024 > Volume 15, No. 11, 2024 >
JAIT 2024 Vol.15(11): 1252-1263
doi: 10.12720/jait.15.11.1252-1263

Comparative Analysis of Pre-trained Deep Learning Models for Facial Landmark Localization on Enhanced Dataset of Heavily Occluded Face Images

Zieb Rabie Alqahtani 1, Mohd Shahrizal Sunar 1,*, and Abdelmonim M. Artoli 2
1. Media and Game Innovation Centre of Excellence, Institute of Human Centered Engineering,
University of Technology Malaysia, Johor, Malaysia
2. Computer Science Department, College of Computer and Information Sciences, King Saud University,
Riyadh, Saudi Arabia
Email: zralqahtani@graduate.utm.my (Z.R.A.); shahrizal@utm.my (M.S.S.); aartoli@ksu.edu.sa (A.M.A.)
*Corresponding author

Manuscript received February 10, 2023; revised April 28, 2024; accepted May 28, 2024; published November 17, 2024.

Abstract—The face is the main component in the human body to be considered in the physical world as it is read to know the feelings of someone, in the same way in computer vision its detection and its landmark localization are pivotal for applications spanning from facial recognition to emotion analysis and augmented reality. Existing datasets in this domain lack diversity especially, in terms of occluded faces, particularly those obscured by medical masks or niqabs. Moreover, a majority of images were captured in controlled environments with limited variations in pose and lighting. This paper addresses this gap by focusing on occluded face images and localizing five crucial landmarks or key points (eyes, nose, and mouth corners) of the face. The Niqab dataset was substantially enhanced with the addition of 11,000 images to the ENiqab-V1 dataset, predominantly featuring faces with 80 to 100% occlusions. Four deep learning models, three particularly belong to the same domain with high accuracy and one is a general object detection model, namely MediaPipe, face.evoLVE, TorchLM, and YOLOv5, were subjected to transfer learning over the ENiqab-V1 dataset. The goal is to perform a comparative analysis of the models and suggest future guidelines for potential accuracy improvement through fine-tuning. The models were evaluated based on accuracy and Mean Square Error (MSE), yielding accuracies of 48.56%, 59.62%, 52.8%, and 52.7%, and Mean Squared Errors (MSEs) of 0.78, 0.59, 1.2, and 0.85, respectively. The comparative analysis shows that face.evoLVE has the highest accuracy but for facial landmark localization over heavily occluded face images we suggest the general object detection model YOLOv5 due to its potential for optimization in terms of accuracy.
 
Keywords—object detection, facial landmarks, heavily occluded face, deep learning

Cite: Zieb Rabie Alqahtani, Mohd Shahrizal Sunar, and Abdelmonim M. Artoli, "Comparative Analysis of Pre-trained Deep Learning Models for Facial Landmark Localization on Enhanced Dataset of Heavily Occluded Face Images," Journal of Advances in Information Technology, Vol. 15, No. 11, pp. 1252-1263, 2024.

Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.