To build a structured AI application you need the ability to convert all the information you have into text, so you can generate embeddings, save them into a vector store, and then feed your Agent to answer user's questions.
Neuron gives you several tools (data loaders) to simplify this process.
use NeuronAI\Providers\Embeddings\VoyageEmbeddingProvider;
use NeuronAI\RAG\DataLoader\FileDataLoader;
$provider = new VoyageEmbeddingProvider(
key: 'VOYAGE_API_KEY',
model: 'VOYAGE_API_MODEL',
);
// Use the file data loader component
$embeddedDocuments = $provider->embedDocuments(
FileDataLoader::file(__DIR__.'/readme.md')->getDocuments()
);
Neuron Data Loader is just an helper, you are free to use any kind of technique to transform your raw data into text to generate embeddings.
Process PDFs
To use PdfReader you need to install the pdftotext php extension.
By default the FileDataLoader process any simple text document. If you need to process an transform into text other document formats you can attach additional readers, like for PDFs:
$provider = new VoyageEmbeddingProvider(
key: 'VOYAGE_API_KEY',
model: 'VOYAGE_API_MODEL',
);
// Register the PDF reader
$documents = FileDataLoader::for(__DIR__.'/readme.pdf');
->addReader('pdf', \NeuronAI\RAG\DataLoader\PdfReader::class)
->getDocuments();
$embeddedDocuments = $provider->embedDocuments($documents);
StringDataLoader
If you are already getting data into strings from your database or other sources, you can use the StringDataLoader to convert string into documents, ready to be embedded and stored by Neuron components:
use NeuronAI\RAG\DataLoader\StringDataLoader;
$contents = [
// list of strings (text to you want embed)
];
foreach ($ontents in $text) {
// Register the PDF reader
$documents = StringDataLoader::for($text)->getDocuments();
$embeddedDocuments = $provider->embedDocuments($documents);
// Save the embedded documents into the vector store for later use running your Agent.
$vectorStore->addDocuments($embeddedDocuments);
}
Full Featured Example (from documents to vector store)
With this toolkit you can iterate a list of file and convert them ensuring every type of file has its reader that will transform the document into text in the right way.
Here is a complete example of a full featured Embedding process:
$vectorStore = new MemoryVectoreStore();
$files = [
// list of file paths...
];
$provider = new VoyageEmbeddingProvider(
key: 'VOYAGE_API_KEY',
model: 'VOYAGE_API_MODEL',
);
foreach ($files in $file) {
// Register the PDF reader
$documents = FileDataLoader::for($file);
->addReader('pdf', \NeuronAI\RAG\DataLoader\PdfReader::class)
->getDocuments();
$embeddedDocuments = $provider->embedDocuments($documents);
// Save the embedded documents into the vector store for later use running your Agent.
$vectorStore->addDocuments($embeddedDocuments);
}
$contents = [
// list of strings (text to you want embed)
];
foreach ($ontents in $text) {
// Register the PDF reader
$documents = StringDataLoader::for($text)->getDocuments();
$embeddedDocuments = $provider->embedDocuments($documents);
// Save the embedded documents into the vector store for later use running your Agent.
$vectorStore->addDocuments($embeddedDocuments);
}
As you can see it takes just a few lines of code. With this simple process you can ingest GB of data into your vector store to feed your RAG agent.