Home > Published Issues > 2024 > Volume 15, No. 6, 2024 >
JAIT 2024 Vol.15(6): 723-734
doi: 10.12720/jait.15.6.723-734

Bridging the Gap: A Hybrid Approach to Medical Relation Extraction Using Pretrained Language Models and Traditional Machine Learning

Nesma A. Hassan 1, Rania A. Abul Seoud 2, and Dina A. Salem 1,*
1. Department of Computer and Software Engineering, Faculty of Engineering,
Misr University for Science and Technology, Giza, Egypt
2. Department of Electrical Engineering, Faculty of Engineering, Fayoum University, Fayoum, Egypt
Email: nesma.abdelaziz@must.edu.eg (N.A.H.); raa00@fayoum.edu.eg (R.A.A.S.); dina.almahdy@must.edu.eg (D.A.S.)
*Corresponding author

Manuscript received November 19, 2023; revised February 19, 2024; accepted March 14, 2024; published June 13, 2024.

Abstract—Feature engineering can be time-consuming and challenging, requiring expertise in Natural Language Processing (NLP) techniques and methods. The objective of this study was to explore the use of contextual word embeddings, specifically Bidirectional Encoder Representations from Transformers (BERT)-generated word embeddings, for biomedical relation extraction. The study utilized machine learning models, including Support Vector Machines, Random Forests, and K-nearest neighbor algorithms, to classify relationships between medical entities based on these embeddings. The attention mechanism of a pre-trained BERT model was also used to capture information related to the relationship between medical entities, leading to more advanced biomedical relation extraction. The performance of the machine learning classifiers was evaluated as classification models. The proposed approach outperformed the most recent state-of-the-art model on two publicly available biomedical relation extraction datasets Chemical-Protein Interactions (ChemProt) and Drug-Drug Interactions (DDI), indicating that traditional machine-learning techniques can compete with recent advancements. Experiments on the ChemProt dataset show that the performance of the proposed model’s F1-Score is 0.778 and on the DDI dataset, F1-Score is 0.815. This study has demonstrated the potential for using contextual word embeddings and machine learning models for biomedical relation extraction, without the need for extensive manual feature engineering.
 
Keywords—Bidirectional Encoder Representations from Transformers (BERT), hemical-Protein Interactions (ChemProt), Drug-Drug Interactions (DDI), K-Nearest Neighbor (KNN), Natural Language Processing (NLP), random forest, relation extraction, Support Vector Machine (SVM)

Cite: Nesma A. Hassan, Rania A. Abul Seoud, and Dina A. Salem, "Bridging the Gap: A Hybrid Approach to Medical Relation Extraction Using Pretrained Language Models and Traditional Machine Learning," Journal of Advances in Information Technology, Vol. 15, No. 6, pp. 723-734, 2024.

Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.