SafetyRAG: Towards Safe Large Language Model-Based Application through Retrieval-Augmented Generation

Home > Published Issues > 2025 > Volume 16, No. 2, 2025 >

JAIT 2025 Vol.16(2): 243-250
doi: 10.12720/jait.16.2.243-250

Sihem Omri 1,*, Manel Abdelkader 2, and Mohamed Hamdi 1

1. Higher School of Communication of Tunis, University of Carthage, Tunis, Tunisia
2. Tunis Business School, University of Tunis, Tunis, Tunisia
Email: sihem.omri@supcom.tn (S.O.); manel.abdelkader@gmail.com (M.A.); mmh@supcom.tn (M.H.)
*Corresponding author

Manuscript received September 23, 2024; revised October 17, 2024; accepted November 27, 2024; published February 17, 2025.

Abstract—Large Language Model (LLM) agents have proven to perform exceptionally well in a wide range of applications, mainly because of their advanced reasoning, use of external tools and information, use of APIs, and ability to interact with environments. They often use a Retrieval-Augmented Generation (RAG) mechanism as a memory module that retrieves relevant information with similar embedding from knowledge bases. However, recent studies indicate LLM-based applications are highly susceptible to malicious uses such as prompt injections (or jailbreaking). To address this issue, most existing techniques rely on filters for detection and blocking, fine-tuning, and reinforcement learning. However, these methods often necessitate complex and resource-intensive data collection and training procedures. In this work, we propose the SafetyRAG framework, a straightforward solution which utilizes the RAGtechnique to explicitly provide to the LLM some safety facts about unsafe prompts during the generation of its response. This approach employs an external knowledge base of safety facts about various unsafe prompts to enhance the decision-making and safe behavior of an LLM. The solution is evaluated with several LLM models, GPT-3.5-turbo, Llama2-13B, Gemma-7B and Mistral-7B- instruct. The results show a significant improvement in all the models’ ability to decline unsafe requests, even with prompt injections.

Keywords—Large Language Model (LLM), Retrieval-Augmented Generation (RAG), chatbot, prompt injection, jailbreaking, LLM safety

Cite: Sihem Omri, Manel Abdelkader, and Mohamed Hamdi, "SafetyRAG: Towards Safe Large Language Model-Based Application through Retrieval-Augmented Generation," Journal of Advances in Information Technology, Vol. 16, No. 2, pp. 243-250, 2025. doi: 10.12720/jait.16.2.243-250

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Click to download

PREVIOUS PAPER

Benchmarking Open-Source Large Language Models on Code-Switched Tagalog-English Retrieval Augmented Generation

NEXT PAPER

Enhancing ECG Images Using Wave Translation Algorithm with CWT—The Coronary Atherosclerosis Detection

Home

Author Guide

Editor Guide

Reviewer Guide

Published Issues

Special Issue

Sections and Topics

journal menu

General Information

Editor-in-Chief

Prof. Kin C. Yow

What's New

Home > Published Issues > 2025 > Volume 16, No. 2, 2025 >

SafetyRAG: Towards Safe Large Language Model-Based Application through Retrieval-Augmented Generation

Article Metrics in Dimensions