Audio

Connect providers specialized in processing Audio to Text and vice-versa

Usually pure AI Audio services don't support full agentic abilities like tools and memory. So, you can use these components as stanalone services in an agentic workflow, or use them inside an Agent since they implement the AIProviderInterface interface. In this case you can benefit from the agentic workflow features like middleware and guardrails.

These component can be helpful for creating local voice assistants for hands-free interaction with models. The typical flow involves capturing audio, transcribing it to text with a separate Speech-To-Text (STT) service, sending that text to an agent for processing, and then using Text-to-Speech (TTS) to speak the response.

Neuron has broader support for local inference systems like Ollama, and Cohere, to help you implement low latency agentic workflows.

Wrap into an Agent

namespace App\Neuron;

use NeuronAI\Agent\Agent;
use NeuronAI\Chat\Messages\UserMessage;
use NeuronAI\Providers\AIProviderInterface;
use NeuronAI\Providers\OpenAI\Audio\OpenAITextToSpeech;

class MyAgent extends Agent
{
    protected function provider(): AIProviderInterface
    {
        return new OpenAITextToSpeech(
            key: 'OPENAI_API_KEY',
            model: 'gpt-4o-mini-tts',
            voice: 'alloy',
        );
    }
}

// Run the agent
$message = MyAgent::make()
    ->chat(new UserMessage("Hi!"))
    ->getMessage();

// Retrieve the audio part of the message (it's in base64 format)
$audioBase64 = $message->getAudio();

// Save the audio file
file_put_contents(__DIR__.'/assets/speech.mp3', base64_decode($audioBase64));

Direct use

$provider = new OpenAITextToSpeech(
    key: 'OPENAI_API_KEY',
    model: 'gpt-4o-mini-tts',
    voice: 'alloy',
);

// Generate speech from text
$message = $provider->chat(new UserMessage("Hi, I'm the creator of Neuron AI framework!"));

// Retrieve the audio part of the message (it's in base64 format)
$audioBase64 = $message->getAudio();

// Save the audio file
file_put_contents(__DIR__.'/assets/speech.mp3', base64_decode($audioBase64));

OpenAI Audio

Text-To-Speech

use NeuronAI\Providers\OpenAI\Audio\OpenAITextToSpeech;

$provider = new OpenAITextToSpeech(
    key: 'OPENAI_API_KEY',
    model: 'gpt-4o-mini-tts',
    voice: 'alloy',
);

// Generate speech from text
$message = $provider->chat(new UserMessage("Hi, I'm the creator of Neuron AI framework!"));

// Retrieve the audio part of the message (it's in base64 format)
$audioBase64 = $message->getAudio();

// Save the audio file
file_put_contents(__DIR__.'/assets/speech.mp3', base64_decode($audioBase64));

Speech-To-Text

use NeuronAI\Providers\OpenAI\Audio\OpenAISpeechToText;

$provider = new OpenAISpeechToText(
    key: 'OPENAI_API_KEY',
    model: 'gpt-4o-transcribe',
);

// Transcript the audio
$message = $provider->chat(
    new UserMessage([
        new TextContent('This audio is about a math lesson. Take care of the technical words.'),
        new AudioContent(__DIR__ . '/assets/intro.mp3', SourceType::URL)
    ])
);

// Print the text gathered from the audio file
echo $message->getContent();

ElevenLabs

Text-To-Speech

use NeuronAI\Providers\ElevenLabs\ElevenLabsTextToSpeech;

$provider = new ElevenLabsTextToSpeech(
    key: 'OPENAI_API_KEY',
    model: 'gpt-4o-mini-tts',
    voice: 'alloy',
);

// Generate speech from text
$message = $provider->chat(new UserMessage("Hi, I'm the creator of Neuron AI framework!"));

// Retrieve the audio part of the message (it's in base64 format)
$audioBase64 = $message->getAudio();

// Save the audio file
file_put_contents(__DIR__.'/asserts/speech.mp3', base64_decode($audioBase64));

Speach-To-Text

use NeuronAI\Providers\OpenAI\Audio\ElevenLabsSpeechToText;

$provider = new ElevenLabsSpeechToText(
    key: 'OPENAI_API_KEY',
    model: 'gpt-4o-transcribe',
);

// Transcript the audio
$message = $provider->chat(
    new UserMessage(
        new AudioContent(__DIR__ . '/assets/intro.mp3', SourceType::URL)
    )
);

// Print the text gathered from the audio file
echo $message->getContent();

PreviousAI Provider NextGetting Started

Last updated 4 days ago

hashtagWrap into an Agent

hashtagDirect use

hashtagOpenAI Audio

hashtagText-To-Speech

hashtagSpeech-To-Text

hashtagElevenLabs

hashtagText-To-Speech

hashtagSpeach-To-Text

Wrap into an Agent

Direct use

OpenAI Audio

Text-To-Speech

Speech-To-Text

ElevenLabs

Text-To-Speech

Speach-To-Text