Audio

Connect providers specialized in processing Audio to Text and vice-versa

Usually pure AI Audio services don't support full agentic abilities like tools and conversation. So, you can use these components as stanalone services in an agentic workflow, or use them inside an Agent since they implement the AIProviderInterface interface. In this case you can benefit from the agentic workflow features like middleware and guardrails.

These component can be helpful for creating local voice assistants for hands-free interaction with models. The typical flow involves capturing audio, transcribing it to text with a separate Speech-To-Text (STT) service, sending that text to an agent for processing, and then using Text-to-Speech (TTS) to speak the response.

As an Agent provider

namespace App\Neuron;

use NeuronAI\Agent\Agent;
use NeuronAI\Chat\Messages\UserMessage;
use NeuronAI\Providers\AIProviderInterface;
use NeuronAI\Providers\OpenAI\Audio\OpenAITextToSpeech;

class MyAgent extends Agent
{
    protected function provider(): AIProviderInterface
    {
        return new OpenAITextToSpeech(
            key: 'OPENAI_API_KEY',
            model: 'gpt-4o-mini-tts',
            voice: 'alloy',
        );
    }
}

// Run the agent
$message = MyAgent::make()
    ->chat(new UserMessage("Hi!"))
    ->getMessage();

// Retrieve the audio part of the message (it's in base64 format)
$audioBase64 = $message->getAudio()->getContent();

// Save the audio file
file_put_contents(__DIR__.'/assets/speech.mp3', base64_decode($audioBase64));

Direct use

OpenAI Audio

Text-To-Speech

Speech-To-Text

ElevenLabs

Text-To-Speech

Speach-To-Text

Last updated