Home > Published Issues > 2023 > Volume 14, No. 2, 2023 >
JAIT 2023 Vol.14(2): 233-241
doi: 10.12720/jait.14.2.233-241

Clickbait Detection in Indonesian News Title with Gray Unbalanced Class Based on BERT

Pulung Nurtantio Andono, Pieter Santoso Hadi, Muljono *, and Catur Supriyanto
Informatics Engineering, Dian Nuswantoro University, Semarang, Indonesia;
Email: pulung.nurtantio.andono@dsn.dinus.ac.id (P.N.A.), p31201902285@mhs.dinus.ac.id (P.S.H.), catur.supriyanto@dsn.dinus.ac.id (C.S.)
*Correspondence: muljono@dsn.dinus.ac.id (M.)

Manuscript received July 1, 2022; revised August 21, 2022; accepted October 7, 2022; published March 17, 2023.

Abstract—Bahasa Indonesia is used by about 263 million people in the world but it is classified as an under- resourced language. The problem of clickbait in news analysis has gained attention in recent years. However, for Indonesian, there is still a lack of resources for clickbait tasks. Clickbait attracts the attention of readers, even though the content is not informative and misleading. The imbalance of the clickbait dataset means unequal distribution of classes within the dataset which affects the classification result. In this research, focal loss is proposed to improve classification accuracy without reducing the number of original data. Normally, clickbait data are separated into two classes, namely clickbait, and non-clickbait. However, some titles are difficult to categorize, even by humans. Therefore, this study categorizes the titles into three categories, namely clickbait, non-clickbait, and gray-clickbait. The proposed method achieves an accuracy of 93.4% in the classification of two classes, which is better than previous studies. However, the proposed method achieves an accuracy of 73.3% in the classification of three classes. Our research shows a high similarity between gray-clickbait and clickbait data, making classification more challenging. On the other hand, the use of titles on three categorizations in clickbait is not enough to provide better classification performance.
 
Keywords—classification, imbalanced data, BERT, focal loss, clickbait, Indonesian

Cite: Pulung Nurtantio Andono, Pieter Santoso Hadi, Muljono, and Catur Supriyanto, "Clickbait Detection in Indonesian News Title with Gray Unbalanced Class Based on BERT," Journal of Advances in Information Technology, Vol. 14, No. 2, pp. 233-241, 2023.

Copyright © 2023 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.