Audio

Connect providers specialized in processing Audio to Text and vice-versa

Usually pure AI Audio services don't support full agentic abilities like tools and memory. So, you can use these components as stanalone services in an agentic workflow, or use them inside an Agent since they implement the AIProviderInterface interface. In this case you can benefit from the agentic workflow features like middleware and guardrails.

These component can be helpful for creating local voice assistants for hands-free interaction with models. The typical flow involves capturing audio, transcribing it to text with a separate Speech-To-Text (STT) service, sending that text to an agent for processing, and then using Text-to-Speech (TTS) to speak the response.

Neuron has broader support for local inference systems like Ollama, and Cohere, to help you implement low latency agentic workflows.

Wrap into an Agent

namespace App\Neuron;

use NeuronAI\Agent\Agent;
use NeuronAI\Chat\Messages\UserMessage;
use NeuronAI\Providers\AIProviderInterface;
use NeuronAI\Providers\OpenAI\Audio\OpenAITextToSpeech;

class MyAgent extends Agent
{
    protected function provider(): AIProviderInterface
    {
        return new OpenAITextToSpeech(
            key: 'OPENAI_API_KEY',
            model: 'gpt-4o-mini-tts',
            voice: 'alloy',
        );
    }
}

// Run the agent
$message = MyAgent::make()
    ->chat(new UserMessage("Hi!"))
    ->getMessage();

// Retrieve the audio part of the message (it's in base64 format)
$audioBase64 = $message->getAudio();

// Save the audio file
file_put_contents(__DIR__.'/assets/speech.mp3', base64_decode($audioBase64));

Direct use

OpenAI Audio

Text-To-Speech

Speech-To-Text

ElevenLabs

Text-To-Speech

Speach-To-Text

Last updated