Image to Text

Image to Text UI

AI Server's Image to Text UI lets you request image classifications from its active Comfy UI Agents:

https://localhost:5006/ImageToText

Using Image to Text Endpoints

These endpoints are used in a similar way to other AI Server endpoints where you can provide:

  • RefId - provide a unique identifier to track requests
  • Tag - categorize like requests under a common group

In addition Queue requests can provide:

  • ReplyTo - URL to send a POST request to when the request is complete

Image to Text

using var fsImage = File.OpenRead("files/test_image.jpg");
var response = client.PostFileWithRequest(new ImageToText(),
    new UploadFile("image", fsImage, "image"));

Queue Image to Text

using var fsImage = File.OpenRead("files/test_image.jpg");
var response = client.PostFileWithRequest(new QueueImageToText(),
    new UploadFile("image", fsImage, "image"));

// Poll for Job Completion Status
GetTextGenerationStatusResponse status = new();
while (status.JobState is BackgroundJobState.Queued or BackgroundJobState.Started)
{
    status = client.Get(new GetTextGenerationStatus { JobId = response.JobId });
    Thread.Sleep(1000);
}
if (status.Results?.Count > 0)
{
    var answer = status.Results[0].Text;
}

INFO

Ensure that the ComfyUI Agent has the Florence 2 model downloaded and installed for the Image-To-Text functionality to work. This can be done by setting the DEFAULT_MODELS environment variable in the .env file to include image-to-text