Home > Published Issues > 2024 > Volume 15, No. 5, 2024 >
JAIT 2024 Vol.15(5): 602-613
doi: 10.12720/jait.15.5.602-613

Empirical Text Analysis for Identifying the Genres of Bengali Literary Work

Ayesha Afroze, Kishowloy Dutta, Sadman Sadik, Sadia Khanam, Raqeebir Rab *, and Mohammad Asifur Rahim
Department of Computer Science and Engineering, Ahsanullah University of Science and Technology (AUST),
Dhaka, Bangladesh
Email: ayeshaafrozeaust@gmail.com (A.A.); kishowloydatta016@gmail.com (K.D.);
sadmansadikhasan@gmail.com (S.S.); sadiakhanamarni111@gmail.com (S.K.);
raqeebir.cse@aust.edu (R.R.); mohammadasifurrahim@gmail.com (M.A.R.)
*Corresponding author

Manuscript received October 24, 2023; revised November 30, 2023; accepted January 25, 2023; published May 16, 2024.

Abstract—Digital books and internet retailers are growing in popularity daily. Different individuals prefer various genres of literature. Categorizing genres facilitates the discovery of books that match a reader’s tastes. The assortment is the process of categorizing or genre-classifying a book. In this paper, we categorize books by genre using a variety of traditional machine learning and deep learning models based on book titles and snippets. Such work exists for books in other languages but has not yet been completed for Bengali novels. We have developed two types of datasets as a result of data collection for this research. One dataset includes the titles of Bengali novels across nine genres, while the other includes book snippets from three genres. For classification, we have employed logistic regression, Support Vector Machines (SVM), random forest classifiers, decision trees, Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and Bidirectional Encoder Representations from Transformers (BERT). Among all the models, BERT has the highest performance for both datasets, with 90% accuracy for the book excerpt dataset and 77% accuracy for the book Title dataset. With the exception of BERT, traditional machine learning models performed better in the Snippets dataset, whereas deep learning models performed better in the Titles dataset. Due to the quantity and the number of words present in the dataset, the performance varied.
 
Keywords—genre, Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), Bidirectional Encoder Representations from Transformers (BERT), Support Vector Machines (SVM), Natural Language Processing, Book Snippets, Recurrent Neural Networks (RNN)

Cite: Ayesha Afroze, Kishowloy Dutta, Sadman Sadik, Sadia Khanam, Raqeebir Rab, and Mohammad Asifur Rahim, "Empirical Text Analysis for Identifying the Genres of Bengali Literary Work," Journal of Advances in Information Technology, Vol. 15, No. 5, pp. 602-613, 2024.

Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.