Post Processor
Improve the RAG output by post-processing vector store results
As with most tools, RAG is easy to use but hard to master. The truth is that there is more to RAG than putting documents into a vector DB and adding an LLM on top. That can work, but it won't always.
With RAG, we are performing a semantic search across many text documents — these could be tens of thousands up to tens of billions of documents.
To ensure fast search times at scale, we typically use vector search — that is, we transform our text into vectors, place them all into a vector database, and compare their proximity to a query using a similarity algorithm (like cosine similarity).
For vector search to work, we need vectors. These vectors are essentially compressions of the "meaning" behind some text into (typically) 768 or 1536-dimensional vectors. There is some information loss because we're compressing this information into a single vector.
Because of this information loss, we often see that the top three (for example) vector search documents will miss relevant information. Unfortunately, the retrieval may return relevant information below our top_k
cutoff.
What do we do if relevant information at a lower position would help our LLM formulate a better response? The easiest approach is to increase the number of documents we're returning (increase top_k
) and pass them all to the LLM.
Unfortunately, we cannot pass everything to the LLM because this dramatically reduces the LLM's performance to find relevant information from the text placed within its context window.
The solution to this issue is retrieving plenty of documents from the vector store and then minimizing the number of documents that make it to the LLM. To do that, you can reorder and filter retrieved documents to keep just the most relevant for our LLM.
Neuron allows you to define a list of post-processor components to pipe as many transformations you need to optimize the agent output.
Rerankers
Reranking is one of the most popular post-process operations you can apply to the retrieved documents. A reranking service calculates a similarity score of each documents retrieved from the vector store with the input query.
We use this score to reorder the documents by relevance and take only the most useful.
Jina Reranker
In the example above you can see how the vector store is instructed to get 50 documents, and the reranker will basically take only the 5 most relevant ones.
Observability
Extending The Framework
With Neuron you can easily create your custom post processor components by simply extending the \NeuronAI\PostProcessor\PostProcessorInterface
:
Implementing the process
method you can perform actions on the list of documents and return the new list. Neuron will run the post processors in the same order they are listed in the postProcessors()
method.
Here is a practical example:
Last updated