RAG
Step by Step guide on how to implement Retrieval-Augmented Generation with Neuron AI framework.
Retrieval-Augmented Generation (RAG) is the process of providing references to a knowledge base outside of the LLM training data sources before generating a response.
Large Language Models (LLMs) are trained on vast volumes of data to be able to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model.
It is a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful also working on your own private data.
Why RAG systems are relevant
Building a RAG system is the way to use the powerful LLM capabilities on your own private data. You can create applications capable to accurately answer questions about a company internal documentations. Or chatbot to serve external customers on the internal rules of an organization.
If it's not about the usage of private data, you can think of RAG as a way to provide the latest research, statistics, or news to the generative models.
How to create a RAG system
Without RAG, the LLM takes the user input and creates a response based on information it was trained on—or what it already knows.
With RAG, an information retrieval component is introduced. It utilizes the user input to first pull information from a new data source. The user query and the relevant information retrieved are both given to the LLM. The LLM uses the new knowledge and its training data to create better responses. The following sections provide an overview of the process.
Even if it can appears a little bit complicated, don't worry, this is just to make you aware of the process. The most of these things are automatically managed by Neuron RAG agent.
There are three most important steps to create a RAG system.
Process external data
The external data you want to use to augment the default LLM knowledge may exist in various formats like files, database records, or long-form text.
Before being able to submit this data to the LLM you have to convert them into a specific format called "Embeddings".
Retrieve relevant information
The embeddings you have generated by processing documents and data need to be stored in specific databases able to deal with their format. This database are called "Vector Store".
Vector store are not only able to store this data, bet also to perform a particular form the "similarity search" between the existing data in the database an a query we provide.
Augment the LLM prompt
Next, the RAG agent augments your input (or prompt) by adding the relevant retrieved data in the context based on your query.
You just need to take care of the first step "Process external data", and Neuron gives you the toolkit to make it simple. The other steps are automatically managed by the Neuron RAG agent.
Implement a RAG Agent
For RAG use cases you must to extend the NeuronAI\RAG\RAG
class instead of the default Agent class.
To create a RAG you need to attach some additional components other than the AI provider, such as a vector store
, and an embeddings provider
.
Here is an example of a RAG implementation:
We have to assume that the vector store attached to the RAG agent is already populated with the embeddings representing the data we want to integrate during LLM interactions.
Talk to the chat bot
Imagine to have previously populated a vector store with your software documentation, and now you want to ask questions about something you didn't clearly understood.
To start the execution of a RAG you must use the answer()
method instead of chat()
Last updated