AI Server provides a set of APIs for interacting with AI models and services, as well as some common image, video and audio processing tasks.
Every API provides two modes of operation: synchronous and asynchronous.
- Synchronous: The request is processed immediately and the response returns a URL to download the result. This has a timeout of 60 seconds.
- Asynchronous: The request is queued and processed in the background. The response returns a URL to download the result when it's ready.
Image Generation APIs​
AI Server has built-in ComfyUI workflows for performing image generation tasks using AI models like SDXL and Flux.
The following tasks are available for image generation:
- Text to Image - Generate an image based on provided text prompts.
- Image to Image - Generate a new image based on an input image and provided prompts.
- Image with Mask - Generate a new image based on an input image, a mask, and provided prompts (applied only to the masked area).
- Image Upscale - Upscale an input image to a higher resolution (currently 2x only).
Speech APIs​
AI Server provides endpoints for speech-related tasks, including Speech-to-Text and Text-to-Speech conversions. These endpoints utilize AI models to process audio and text data.
The following tasks are available for speech processing:
- Speech to Text - Convert audio input to text output.
- Text to Speech - Convert text input to audio output.
AI Server API Endpoints​
AI Server has endpoints for AI tasks as well as media processing tasks.
Generative AI Endpoints​
- Chat: Interact with LLMs to generate text.
- Sync:
/api/OpenAiChatCompletion
- Async:
/api/QueueOpenAiChatCompletion
- Sync:
- Image: Generate images from text.
- Sync:
/api/TextToImage
- Async:
/api/QueueTextToImage
- Sync:
- Image to Image: Generate images from images.
- Sync:
/api/ImageToImage
- Async:
/api/QueueImageToImage
- Sync:
- Image With Mask: Generate images from images with a mask.
- Sync:
/api/ImageWithMask
- Async:
/api/QueueImageWithMask
- Sync:
- Image Upscale: Upscale images.
- Sync:
/api/ImageUpscale
- Async:
/api/QueueImageUpscale
- Sync:
- Image To Text: Generate text from images.
- Sync:
/api/ImageToText
- Async:
/api/QueueImageToText
- Sync:
- Speech to Text: Transcribe audio to text.
- Sync:
/api/SpeechToText
- Async:
/api/QueueSpeechToText
- Sync:
- Text To Speech: Generate audio from text.
- Sync:
/api/TextToSpeech
- Async:
/api/QueueTextToSpeech
- Sync:
INFO
The Chat API is also available as an OpenAI compatible endpoint at /v1/chat/completions
with matching DTOs.
While not all clients will work with this endpoint, the structure of the request and response is the same.
Media Processing Endpoints​
Media endpoints are used for processing images, and videos. Videos are processed remotely by the ComfyUI Agent, while image processing is done by the AI Server itself.
Video Processing Endpoints​
- Scale Video: Scale a video to a different resolution.
- Sync:
/api/ScaleVideo
- Async:
/api/QueueScaleVideo
- Sync:
- Crop Video: Crop a video to a specific size.
- Sync:
/api/CropVideo
- Async:
/api/QueueCropVideo
- Sync:
- Watermark Video: Add a watermark to a video.
- Sync:
/api/WatermarkVideo
- Async:
/api/QueueWatermarkVideo
- Sync:
- Convert Video: Convert a video to a different format.
- Sync:
/api/ConvertVideo
- Async:
/api/QueueConvertVideo
- Sync:
- Trim Video: Trim a video to a specific length via a start and end time.
- Sync:
/api/TrimVideo
- Async:
/api/QueueTrimVideo
- Sync:
Image Processing Endpoints​
- Scale Image: Scale an image to a different resolution.
- Sync:
/api/ScaleImage
- Async:
/api/QueueScaleImage
- Sync:
- Crop Image: Crop an image to a specific size.
- Sync:
/api/CropImage
- Async:
/api/QueueCropImage
- Sync:
- Watermark Image: Add a watermark to an image.
- Sync:
/api/WatermarkImage
- Async:
/api/QueueWatermarkImage
- Sync:
- Convert Image: Convert an image to a different format.
- Sync:
/api/ConvertImage
- Async:
/api/QueueConvertImage
- Sync:
Architecture​
The AI Server is designed to be a lite-weight router for AI services, providing a common interface for AI services to be accessed via APIs with typed client support in many languages. As such, heavy processing tasks are offloaded to other services, including self-hosted ones like the ComfyUI Agent.
graph TD A[API Client] -->|API Request| B(<img class="w-24 h-24" src="/img/logo.svg"/>) B -->|API Request| C[Replicate API] B -->|API Request| I[OpenRouter API] B -->|API Request| D[OpenAI API] B -->|Video Processing| E[ComfyUI Agent] E -->|Video Processing| F[FFmpeg] E -->|AI Processing| G[PiperTTS] E -->|AI Processing| J[Whisper] E -->|AI Processing| K[SDXL] E -->|AI Processing| L[Flux.1.Schnell] B -->|Image Processing| H[AI Server] B -->|AI Processing| E[ComfyUI Agent]