Home > Published Issues > 2025 > Volume 16, No. 2, 2025 >
JAIT 2025 Vol.16(2): 243-250
doi: 10.12720/jait.16.2.243-250

SafetyRAG: Towards Safe Large Language Model-Based Application through Retrieval-Augmented Generation

Sihem Omri 1,*, Manel Abdelkader 2, and Mohamed Hamdi 1
1. Higher School of Communication of Tunis, University of Carthage, Tunis, Tunisia
2. Tunis Business School, University of Tunis, Tunis, Tunisia
Email: sihem.omri@supcom.tn (S.O.); manel.abdelkader@gmail.com (M.A.); mmh@supcom.tn (M.H.)
*Corresponding author

Manuscript received September 23, 2024; revised October 17, 2024; accepted November 27, 2024; published February 17, 2025.

Abstract—Large Language Model (LLM) agents have proven to perform exceptionally well in a wide range of applications, mainly because of their advanced reasoning, use of external tools and information, use of APIs, and ability to interact with environments. They often use a Retrieval-Augmented Generation (RAG) mechanism as a memory module that retrieves relevant information with similar embedding from knowledge bases. However, recent studies indicate LLM-based applications are highly susceptible to malicious uses such as prompt injections (or jailbreaking). To address this issue, most existing techniques rely on filters for detection and blocking, fine-tuning, and reinforcement learning. However, these methods often necessitate complex and resource-intensive data collection and training procedures. In this work, we propose the SafetyRAG framework, a straightforward solution which utilizes the RAGtechnique to explicitly provide to the LLM some safety facts about unsafe prompts during the generation of its response. This approach employs an external knowledge base of safety facts about various unsafe prompts to enhance the decision-making and safe behavior of an LLM. The solution is evaluated with several LLM models, GPT-3.5-turbo, Llama2-13B, Gemma-7B and Mistral-7B- instruct. The results show a significant improvement in all the models’ ability to decline unsafe requests, even with prompt injections.
 
Keywords—Large Language Model (LLM), Retrieval-Augmented Generation (RAG), chatbot, prompt injection, jailbreaking, LLM safety

Cite: Sihem Omri, Manel Abdelkader, and Mohamed Hamdi, "SafetyRAG: Towards Safe Large Language Model-Based Application through Retrieval-Augmented Generation," Journal of Advances in Information Technology, Vol. 16, No. 2, pp. 243-250, 2025. doi: 10.12720/jait.16.2.243-250

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions