RAG
Step by Step guide on how to implement Retrieval-Augmented Generation with Neuron AI framework.
Last updated
Step by Step guide on how to implement Retrieval-Augmented Generation with Neuron AI framework.
Last updated
Retrieval-Augmented Generation (RAG) is the process of providing references to a knowledge base outside of the LLM training data sources before generating a response.
Large Language Models (LLMs) are trained on vast volumes of data to be able to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model.
It is a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful also working on your own private data.
Building a RAG system is the way to use the powerful LLM capabilities on your own private data. You can create applications capable of accurately answering questions about a company internal documentations. Or chatbot to serve external customers on the internal rules of an organization.
If it's not about the usage of private data, you can think of RAG as a way to provide the latest research, statistics, or news to the generative models.
Without RAG, the LLM takes the user input and creates a response based on information it was trained on—or what it already knows.
With RAG, an information retrieval component is introduced. It utilizes the user input to first pull information from a new data source. The user query and the relevant information retrieved are both given to the LLM. The LLM uses the new knowledge and its training data to create better responses. The following sections provide an overview of the process.
Even if it can appear a little bit complicated, don't worry, this is just to make you aware of the process. Most of these things are automatically managed by a Neuron RAG agent.
There are three most important steps to create a RAG system.
The external data you want to use to augment the default LLM knowledge may exist in various formats like files, database records, or long-form text.
Before being able to submit this data to the LLM you have to convert them into a specific format called "Embeddings".
The embeddings you have generated by processing documents and data need to be stored in specific databases able to deal with their format. This database are called "Vector Store".
Vector store are not only able to store this data, bet also to perform a particular form the "similarity search" between the existing data in the database an a query we provide.
Next, the RAG agent augments your input (or prompt) by adding the relevant retrieved data in the context based on your query.
You just need to take care of the first step "Process external data", and Neuron gives you the toolkit to make it simple. The other steps are automatically managed by the Neuron RAG agent.
For RAG use cases, you must extend the NeuronAI\RAG\RAG
class instead of the default Agent class.
To create a RAG you need to attach some additional components other than the AI provider, such as a vector store
, and an embeddings provider
.
Here is an example of a RAG implementation:
We have to assume that the vector store attached to the RAG agent is already populated with the embeddings representing the data we want to integrate during LLM interactions.
Imagine having previously populated a vector store with your software documentation, and now you want to ask questions about something you didn't clearly understand.
To start the execution of a RAG you must use the answer()
method instead of chat()