Home > Published Issues > 2024 > Volume 15, No. 6, 2024 >
JAIT 2024 Vol.15(6): 714-722
doi: 10.12720/jait.15.6.714-722

A Method for Distinguishing Model Generated Text and Human Written Text

Hinari Shimada * and Masaomi Kimura
College of Engineering, Computer Science and Engineering, Shibaura Institute of Technology, Koto, Japan
Email: al21822@shibaura-it.ac.jp (H.S.); masaomi@shibaura-it.ac.jp (M.K.)
*Corresponding author

Manuscript received January 11, 2024; revised February 1, 2024; accepted February 22, 2024; published June 13, 2024.

Abstract—With the rapid development of Large Language Models (LLMs), such as ChatGPT, it is extremely difficult for humans to accurately detect whether sentences are written by LLMs. Especially in academic fields, there is a need to assist human evaluators by discriminating sentences to recognize differences. Assignments such as essays and theses typically require human authors to write content. However, there is a risk of effortlessly generating text using advanced LLMs such as ChatGPT, potentially allowing the completion of class assignments without human effort. As it has a significant impact on the fair evaluation of students, we need to distinguish between text generated by model (model generated text) and written by human (human written text). Detection using existing statistical measures, such as log likelihoods, does not perform well for black-boxed models, such as ChatGPT, because it requires access to the internals of the models. Therefore, we propose a new approach that captures text from two different perspectives using log likelihoods and sentence embeddings with multiple LLMs. In experiments using data, including those generated by the black-box model ChatGPT, our proposed method demonstrated superior accuracy compared to existing approaches.
 
Keywords—Large Language Models (LLMs), model generated text, human written text, log likelihoods, sentence embeddings

Cite: Hinari Shimada and Masaomi Kimura, "A Method for Distinguishing Model Generated Text and Human Written Text," Journal of Advances in Information Technology, Vol. 15, No. 6, pp. 714-722, 2024.

Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.