Text to Speech

Using Text to Speech Endpoints

These endpoints are used in a similar way to other AI Server endpoints where you can provide:

  • RefId - provide a unique identifier to track requests
  • Tag - categorize like requests under a common group

In addition Queue requests can provide:

  • ReplyTo - URL to send a POST request to when the request is complete

Text to Speech

The Text to Speech endpoint converts text input into audio output.

var response = client.PostFilesWithRequest(new TextToSpeech {
        Input = "Hello, how are you?"
    },
    [new UploadFile("test_audio.wav", File.OpenRead("files/test_audio.wav"), "audio")]
);
response.Outputs[0].Url.DownloadFileTo(outputFileName);

Queue Text to Speech

For generating longer audio files or when you want to process the request asynchronously, you can use the Queue Text to Speech endpoint.

var response = client.PostFilesWithRequest(new QueueTextToSpeech {
        Text = "Hello, how are you?"
    },
    [new UploadFile("test_audio.wav", File.OpenRead("files/test_audio.wav"), "audio")]
);

Comfy UI

The ComfyUI Agent uses PiperTTS to generate the audio files. You can configure download the necessary models by setting the DEFAULT_MODELS in the .env file to include text-to-speech for your ComfyUI Agent where PiperTTS via ComfyUI Agent uses the preconfigured lessac model.

Available Comfy UI Models:

  • text-to-speech - Default (Lessic)
  • lessac - Piper TTS using the US English Lessac "high" voice model

Open AI

If you have included an OPENAI_API_KEY in your .env file, you can also use the OpenAI API to generate audio files from text which by default uses their alloy voice model.

Available Open AI Model Voice Options:

  • text-to-speech - Default (Alloy)
  • tts-alloy - Alloy
  • tts-echo - Echo
  • tts-fable - Fable
  • tts-onyx - Onyx
  • tts-nova - Nova
  • tts-shimmer - Shimmer