Neuron AI
GitHubForumNewsletter
  • Getting Started
    • Introduction
  • Key Concepts
  • Installation
  • Agent
  • Tools & Function Calls
  • Streaming
  • RAG
  • Attach Images
  • Advanced
    • Structured Output
    • Observability
    • MCP Servers Connection
    • Error Handling
  • Components
    • AI provider
    • Chat History & Memory
    • Embeddings Provider
    • Vector Store
    • Data loader
  • Post Processor
  • Examples
    • YouTube Agent
Powered by GitBook
On this page
  • Rerankers
  • Observability
  • Extending The Framework

Post Processor

Improve the RAG output by post-processing vector store results

As with most tools, RAG is easy to use but hard to master. The truth is that there is more to RAG than putting documents into a vector DB and adding an LLM on top. That can work, but it won't always.

With RAG, we are performing a semantic search across many text documents — these could be tens of thousands up to tens of billions of documents.

To ensure fast search times at scale, we typically use vector search — that is, we transform our text into vectors, place them all into a vector database, and compare their proximity to a query using a similarity algorithm (like cosine similarity).

For vector search to work, we need vectors. These vectors are essentially compressions of the "meaning" behind some text into (typically) 768 or 1536-dimensional vectors. There is some information loss because we're compressing this information into a single vector.

Because of this information loss, we often see that the top three (for example) vector search documents will miss relevant information. Unfortunately, the retrieval may return relevant information below our top_k cutoff.

What do we do if relevant information at a lower position would help our LLM formulate a better response? The easiest approach is to increase the number of documents we're returning (increase top_k) and pass them all to the LLM.

Unfortunately, we cannot pass everything to the LLM because this dramatically reduces the LLM's performance to find relevant information from the text placed within its context window.

The solution to this issue is retrieving plenty of documents from the vector store and then minimizing the number of documents that make it to the LLM. To do that, you can reorder and filter retrieved documents to keep just the most relevant for our LLM.

Neuron allows you to define a list of post-processor components to pipe as many transformations you need to optimize the agent output.

Rerankers

Reranking is one of the most popular post-process operations you can apply to the retrieved documents. A reranking service calculates a similarity score of each documents retrieved from the vector store with the input query.

We use this score to reorder the documents by relevance and take only the most useful.

Jina Reranker

use NeuronAI\RAG\RAG;
use NeuronAI\RAG\Embeddings\EmbeddingsProviderInterface;
use NeuronAI\RAG\Embeddings\VoyageEmbeddingProvider;
use NeuronAI\RAG\PostProcessor\JinaRerankerPostProcessor;
use NeuronAI\RAG\VectorStore\PineconeVectoreStore;
use NeuronAI\RAG\VectorStore\VectorStoreInterface;

class MyChatBot extends RAG
{
    ...
	
    protected function embeddings(): EmbeddingsProviderInterface
    {
        return new VoyageEmbeddingProvider(
            key: 'VOYAGE_API_KEY',
            model: 'VOYAGE_MODEL'
        );
    }
    
    protected function vectorStore(): VectorStoreInterface
    {
        return new PineconeVectoreStore(
            key: 'PINECONE_API_KEY',
            indexUrl: 'PINECONE_INDEX_URL',
            topK: 50
        );
    }

    protected function postProcessors(): array
    {
        return [
            new JinaRerankerPostProcessor(
                apiKey: 'JINA_API_KEY',
                model: 'JINA_MODEL',
                topN: 5
            ),
        ];
    }
}

In the example above you can see how the vector store is instructed to get 50 documents, and the reranker will basically take only the 5 most relevant ones.

Observability

Extending The Framework

With Neuron you can easily create your custom post processor components by simply extending the \NeuronAI\PostProcessor\PostProcessorInterface:

namespace NeuronAI\RAG\PostProcessor;

use NeuronAI\Chat\Messages\Message;
use NeuronAI\RAG\Document;

interface PostProcessorInterface
{
    /**
     * Process an array of documents and return the processed documents.
     *
     * @param Message $question The question to process the documents for.
     * @param array<Document> $documents The documents to process.
     * @return array<Document> The processed documents.
     */
    public function process(Message $question, array $documents): array;
}

Implementing the process method you can perform actions on the list of documents and return the new list. Neuron will run the post processors in the same order they are listed in the postProcessors() method.

Here is a practical example:

use \NeuronAI\RAG\PostProcessor\PostProcessorInterface;

// Implement your custom component
class CutOffPostProcessor implements PostProcessorInterface
{
    public function __constructor(protected int $level) {}

    public function process(Message $question, array $documents): array
    {
        /*
         * Apply a cut off on the score returned by the vector store
         */
         
        return $documents;
    }
}

// Add to the agent
class MyChatBot extends RAG
{
    ...

    protected function postProcessors(): array
    {
        return [
            new CutOffPostProcessor(
                level: 0.9
            ),
        ];
    }
}
PreviousData loaderNextYouTube Agent

Last updated 8 days ago

Neuron built-in observability features automatically trace the execution of each post processor, so you'll be able to monitor interactions with external services in your Inspector account. Learn more in the .

observability section