A Lip-Reading Model for Tagalog Using Multimodal Deep Learning Approach
Abstract
Purpose – The main purpose of this research is to develop a Tagalog-specific lip-reading model utilizing a multimodal deep learning approach, with a focus on visual and textual information. The research will address the underrepresentation of linguistically diverse languages in lip-reading research such as Tagalog. It aims to enhance communication between native and non-native Tagalog speakers who are deaf and hard of hearing, paving the way for a linguistically inclusive AI and lip-reading system.
Method – The research will employ the use of a hybrid multimodal convolutional neural network and long-term short-term memory model that is inspired by the LipNet Architecture, by integrating facial landmarks and contextual language information with a multimodal approach.
Results – The proposed Tagalog lip-reading model generated an increase in processing speed of at least 25%, optimized both by training and evaluation phases without compromising accuracy. Highlights of the training show great results in 80 epochs together with a validation accuracy of 89.5%.
Conclusion – The research showed the efficacy of the multimodal approach, proving the advantages of integrating visual and textual information for lip-reading tasks in the Tagalog language. The research has achieved a great result in terms of performance by tailoring the model architecture to the unique phonetic features of the Tagalog language.
Recommendations – Future research can explore the generalizability of the proposed model to other unexplored languages, considering its adaptability to various speaking styles, accents, and noise levels.
Research Implications – The success of this research in generating a lip-reading model for the Tagalog language showcased the significance of linguistically diverse datasets with a multimodal approach for the broad use of human-computer interaction.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.