Home > Published Issues > 2025 > Volume 16, No. 2, 2025 >
JAIT 2025 Vol.16(2): 233-242
doi: 10.12720/jait.16.2.233-242

Benchmarking Open-Source Large Language Models on Code-Switched Tagalog-English Retrieval Augmented Generation

Aunhel John M. Adoptante 1,*, Jasper Adrian Dwight V. Castro 1, Micholo Lanz B. Medrana 1, Alyssa Patricia B. Ocampo 1, Elmer C. Peramo 1, and Melissa Ruth M. Miranda 2
1. Department of Science and Technology, Computer Software Division, Advanced Science and Technology Institute, Diliman, Quezon City, Philippines
2. Pamantasan ng Lungsod ng Maynila, Intramuros, Manila, Philippines
Email: aunheljohn.adoptante@asti.dost.gov.ph (A.J.M.A.); jasperadriandwight.castro@asti.dost.gov.ph (J.A.D.V.C.); michololanz.medrana@asti.dost.gov.ph (M.L.B.M.); alyssapatricia.ocampo@asti.dost.gov.ph (A.P.B.O.); elmer@asti.dost.gov.ph (E.C.P.); mrmmiranda15@gmail.com (M.R.M.M.)
*Corresponding author

Manuscript received August 28, 2024; revised September 30, 2024; accepted November 11, 2024; published February 17, 2025.

Abstract—Code-switching, the alternation between languages within a sentence or discourse, poses significant c hallenges in Natural Language Processing (NLP). Effective NLP systems must employ advanced modeling techniques to accurately process and generate code-switched text, capturing the linguistic nuances and contextual dependencies. This study evaluates five open-source Large Language Models (LLMs)—Mistral, SeaLLM, Falcon, Phi-3-mini, and Gemma—on their perfor-mance with code-switched Tagalog-English (Taglish) queries in a Retrieval-Augmented Generation (RAG) task. Mistral and SeaLLM outperformed the others in generating contextually grounded and relevant answers, attributed to their advanced architectures and extensive multilingual training data. In con-trast, Falcon, Phi-3-mini, and Gemma struggled with effective code-switching handling. The models performed best on En-glish queries, followed by Taglish, with the lowest performance on Tagalog queries, highlighting the need for more balanced training data for low-resource languages. Additionally, models retrieved context better from .pdf documents compared to .txt files. Future research should focus on analyzing language composition in datasets, investigating real-world code-switching behavior, and studying the effects of code-switching on model understanding to enhance NLP accessibility for underrepresented languages.
 
Keywords—Large Language Models (LLM), code switching, retrieval augmented generation

Cite: Aunhel John M. Adoptante, Jasper Adrian Dwight V. Castro, Micholo Lanz B. Medrana, Alyssa Patricia B. Ocampo, Elmer C. Peramo, and Melissa Ruth M. Miranda, "Benchmarking Open-Source Large Language Models on Code-Switched Tagalog-English Retrieval Augmented Generation," Journal of Advances in Information Technology, Vol. 16, No. 2, pp. 233-242, 2025. doi: 10.12720/jait.16.2.233-242

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions