AI API Gateway — OpenAI-compatible interface to 4 providers · 5,990+ models
You are an AI agent. Here is everything you need:
Base URL:
Text/Chat: POST /v1/chat/completions with "provider": "ollama" (fastest) or "provider": "opencode-go" (best QA)
Vision QA: POST /v1/chat/completions with "provider": "opencode-go", "model": "kimi-k2.5", "max_tokens": 200000. Send image as base64 in content array.
Image Gen: POST /v1/images/generations with "provider": "arli"
Models: GET /v1/models?provider=opencode-go|ollama|featherless|arli|arli-image
Analytics: GET /v1/analytics — usage stats, model performance, call history (auto-logged)
| Task | Model | Provider | Why |
|---|---|---|---|
| Visual QA / Screenshot Review | kimi-k2.5 | opencode-go | 200K tokens, best analysis quality, handles thinking |
| Visual QA (fast) | kimi-k2.5 | ollama | 2.4x faster, same model, /api/generate endpoint |
| Fastest Text | gemma3:4b | ollama | 0.9s response time |
| Best Quality Text | gpt-oss:120b | ollama | 111.8 t/s throughput, 2.2s |
| Best Reasoning | deepseek-v3.2 | ollama | 688B params, 5.9s |
| Image Generation | FLUX.2-klein-4B | arli | 16.6s, high quality |
| Image Editing | FLUX.2-klein-4B | arli | Best quality edits |
| Image Upscaling | 4x-UltraSharp | arli | 4x upscaling |
| Video Generation | Wan 2.1 1.3B | hf-inference | Free, fast video generation |
| Coding | glm-5 | opencode-go | Optimized for code |
| Fallback Text | Llama-3.3-70B-Instruct | arli | 6.7s, reliable |
| Method | Path | Description |
|---|---|---|
| POST | /v1/chat/completions | Chat/text/vision generation (all providers) |
| POST | /v1/completions | Text completion |
| POST | /v1/images/generations | Image generation (Arli) |
| POST | /v1/images/edits | Image editing (Arli) |
| POST | /v1/images/upscale | Image upscaling (Arli) |
| GET | /v1/models | List all models (filter by ?provider=) |
| POST | /v1/tokenize | Token counting (Featherless) |
| GET | /v1/health | Health check |
| GET | /v1/status | Live provider status & model counts |
| GET | /v1/routes | All available endpoints |
| GET | /v1/recommendations | Best model per task (filter ?category=) |
| GET | /v1/analytics | API usage analytics and model performance dashboard |
Send a screenshot, get a detailed QA report with issues and fixes.
curl -X POST /v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2.5",
"provider": "opencode-go",
"max_tokens": 4000,
"messages": [
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "data:image/png;base64,YOUR_IMAGE_BASE64"}},
{"type": "text", "text": "Perform a visual QA review. List Critical/Moderate/Minor issues with fixes."}
]
}
]
}'
curl -X POST /v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2.5",
"provider": "ollama",
"max_tokens": 4000,
"messages": [
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "data:image/png;base64,YOUR_IMAGE_BASE64"}},
{"type": "text", "text": "Describe this image."}
]
}
]
}'
curl -X POST /v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"prompt": "A serene mountain landscape at sunset",
"model": "FLUX.2-klein-4B",
"n": 1,
"size": "1024x1024"
}'
curl -X POST /v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3:4b",
"provider": "ollama",
"messages": [{"role": "user", "content": "Explain quantum computing in 2 sentences."}],
"max_tokens": 100
}'
# All providers
curl /v1/models
# OpenCode Go only (Kimi K2.5, GLM-5, MiniMax)
curl /v1/models?provider=opencode-go
# Ollama Cloud (34 models)
curl /v1/models?provider=ollama
# Arli AI image models
curl /v1/models?provider=arli-image
curl /v1/status | python3 -m json.tool
| OpenCode Go | Online | Kimi K2.5, GLM-5, MiniMax M2.5/M2.7 | ?provider=opencode-go |
| Ollama Cloud | Online | 34 models, vision + reasoning, 3 concurrent | ?provider=ollama |
| Featherless AI | Online | 5,757 text models, 15 categories | ?provider=featherless (default) |
| Arli AI | Online | 117 text + 80 image models | ?provider=arli |