Using Text to Speech Endpoints​
These endpoints are used in a similar way to other AI Server endpoints where you can provide:
RefId
- provide a unique identifier to track requestsTag
- categorize like requests under a common group
In addition Queue requests can provide:
ReplyTo
- URL to send a POST request to when the request is complete
Text to Speech​
The Text to Speech endpoint converts text input into audio output.
var response = client.PostFilesWithRequest(new TextToSpeech {
Input = "Hello, how are you?"
},
[new UploadFile("test_audio.wav", File.OpenRead("files/test_audio.wav"), "audio")]
);
response.Outputs[0].Url.DownloadFileTo(outputFileName);
Queue Text to Speech​
For generating longer audio files or when you want to process the request asynchronously, you can use the Queue Text to Speech endpoint.
var response = client.PostFilesWithRequest(new QueueTextToSpeech {
Text = "Hello, how are you?"
},
[new UploadFile("test_audio.wav", File.OpenRead("files/test_audio.wav"), "audio")]
);
Comfy UI​
The ComfyUI Agent uses PiperTTS to generate the audio files. You can configure download the necessary models by setting the DEFAULT_MODELS
in the .env
file to include text-to-speech for your ComfyUI Agent where
PiperTTS via ComfyUI Agent uses the preconfigured lessac
model.
Available Comfy UI Models:
text-to-speech
- Default (Lessic)lessac
- Piper TTS using the US English Lessac "high" voice model
Open AI​
If you have included an OPENAI_API_KEY
in your .env
file, you can also use the OpenAI API to generate audio files from text which by default uses their alloy
voice model.
Available Open AI Model Voice Options:
text-to-speech
- Default (Alloy)tts-alloy
- Alloytts-echo
- Echotts-fable
- Fabletts-onyx
- Onyxtts-nova
- Novatts-shimmer
- Shimmer