AI Server APIs

AI Server provides a set of APIs for interacting with AI models and services, as well as some common image, video and audio processing tasks.

Every API provides two modes of operation: synchronous and asynchronous.

Synchronous: The request is processed immediately and the response returns a URL to download the result. This has a timeout of 60 seconds.
Asynchronous: The request is queued and processed in the background. The response returns a URL to download the result when it's ready.

Image Generation APIs

AI Server has built-in ComfyUI workflows for performing image generation tasks using AI models like SDXL and Flux.

The following tasks are available for image generation:

Text to Image - Generate an image based on provided text prompts.
Image to Image - Generate a new image based on an input image and provided prompts.
Image with Mask - Generate a new image based on an input image, a mask, and provided prompts (applied only to the masked area).
Image Upscale - Upscale an input image to a higher resolution (currently 2x only).

Speech APIs

AI Server provides endpoints for speech-related tasks, including Speech-to-Text and Text-to-Speech conversions. These endpoints utilize AI models to process audio and text data.

The following tasks are available for speech processing:

Speech to Text - Convert audio input to text output.
Text to Speech - Convert text input to audio output.

AI Server API Endpoints

AI Server has endpoints for AI tasks as well as media processing tasks.

Generative AI Endpoints

Chat: Interact with LLMs to generate text.
- Sync: /api/OpenAiChatCompletion
- Async: /api/QueueOpenAiChatCompletion
Image: Generate images from text.
- Sync: /api/TextToImage
- Async: /api/QueueTextToImage
Image to Image: Generate images from images.
- Sync: /api/ImageToImage
- Async: /api/QueueImageToImage
Image With Mask: Generate images from images with a mask.
- Sync: /api/ImageWithMask
- Async: /api/QueueImageWithMask
Image Upscale: Upscale images.
- Sync: /api/ImageUpscale
- Async: /api/QueueImageUpscale
Image To Text: Generate text from images.
- Sync: /api/ImageToText
- Async: /api/QueueImageToText
Speech to Text: Transcribe audio to text.
- Sync: /api/SpeechToText
- Async: /api/QueueSpeechToText
Text To Speech: Generate audio from text.
- Sync: /api/TextToSpeech
- Async: /api/QueueTextToSpeech

INFO

The Chat API is also available as an OpenAI compatible endpoint at /v1/chat/completions with matching DTOs. While not all clients will work with this endpoint, the structure of the request and response is the same.

Media Processing Endpoints

Media endpoints are used for processing images, and videos. Videos are processed remotely by the ComfyUI Agent, while image processing is done by the AI Server itself.

Video Processing Endpoints

Scale Video: Scale a video to a different resolution.
- Sync: /api/ScaleVideo
- Async: /api/QueueScaleVideo
Crop Video: Crop a video to a specific size.
- Sync: /api/CropVideo
- Async: /api/QueueCropVideo
Watermark Video: Add a watermark to a video.
- Sync: /api/WatermarkVideo
- Async: /api/QueueWatermarkVideo
Convert Video: Convert a video to a different format.
- Sync: /api/ConvertVideo
- Async: /api/QueueConvertVideo
Trim Video: Trim a video to a specific length via a start and end time.
- Sync: /api/TrimVideo
- Async: /api/QueueTrimVideo

Image Processing Endpoints

Scale Image: Scale an image to a different resolution.
- Sync: /api/ScaleImage
- Async: /api/QueueScaleImage
Crop Image: Crop an image to a specific size.
- Sync: /api/CropImage
- Async: /api/QueueCropImage
Watermark Image: Add a watermark to an image.
- Sync: /api/WatermarkImage
- Async: /api/QueueWatermarkImage
Convert Image: Convert an image to a different format.
- Sync: /api/ConvertImage
- Async: /api/QueueConvertImage

Architecture

The AI Server is designed to be a lite-weight router for AI services, providing a common interface for AI services to be accessed via APIs with typed client support in many languages. As such, heavy processing tasks are offloaded to other services, including self-hosted ones like the ComfyUI Agent.

graph TD
    A[API Client] -->|API Request| B(<img class="w-24 h-24" src="/img/logo.svg"/>)
    B -->|API Request| C[Replicate API]
    B -->|API Request| I[OpenRouter API]
    B -->|API Request| D[OpenAI API]
    B -->|Video Processing| E[ComfyUI Agent]
    E -->|Video Processing| F[FFmpeg]
    E -->|AI Processing| G[PiperTTS]
    E -->|AI Processing| J[Whisper]
    E -->|AI Processing| K[SDXL]
    E -->|AI Processing| L[Flux.1.Schnell]
    B -->|Image Processing| H[AI Server]
    B -->|AI Processing| E[ComfyUI Agent]

Edit this page on GitHub

AI Server APIs

Image Generation APIs​

Speech APIs​

AI Server API Endpoints​

Generative AI Endpoints​

Media Processing Endpoints​

Video Processing Endpoints​

Image Processing Endpoints​

Architecture​

On This Page