Model-agnostic generative vision API for images and video. Text-to-image, image-to-image, text-to-video, and image-to-video with backends for MLX-Gen, Diffusers, stable-diffusion.cpp, and OpenAI-compatible HTTP.
from abstractvision import VisionManager
vm = VisionManager(backend=backend, store=LocalAssetStore())
# Text to Image
img = vm.generate_image("a red fox in snow")
# Image to Image
edit = vm.edit_image("make it watercolor", image="input.png")
AbstractVision provides a model-agnostic generative vision API covering images and video. It ships a capability registry, artifact-ref outputs, and backends for remote and local inference — from cloud APIs to Apple Silicon MLX quantized models.
Thin orchestration layer that delegates to pluggable backends. Supports text_to_image, image_to_image, text_to_video, and image_to_video with optional artifact-ref outputs for cross-process portability.
Packaged vision_model_capabilities.json declares what each model supports. Query tasks, models, and capabilities programmatically. Curated entries with license and precision metadata per ADR 0005.
Discovered automatically via entry points. Exposes llm.vision.t2i(...), llm.vision.i2i(...), provider catalog discovery, and model residency control for local backends through the unified capability contract.
Generate images and video with the backend that fits your hardware and workflow. Every backend normalizes parameters through the shared model capability registry.
Generate images from text prompts across all four backends. Supports Stable Diffusion, FLUX.2, Qwen Image, ERNIE, FIBO, and OpenAI models with configurable steps, guidance scale, and resolution.
Edit and transform existing images. Supported across MLX-Gen (FLUX.2 klein/base, Qwen Image Edit, FIBO Edit), Diffusers, and OpenAI-compatible backends. Mask inputs supported where the runtime allows.
Local video generation via MLX-Gen Wan 2.2 TI2V on Apple Silicon. Remote video via OpenAI-compatible HTTP endpoints. Frame/step progress reporting in shell and Python callbacks.
Animate a first-frame image into video. MLX-Gen Wan 2.2 TI2V provides local first-frame to video generation. Remote endpoints available through the OpenAI-compatible backend.
Apple Silicon-optimized q4/q8 quantized model presets. FLUX.2, Qwen Image, Z-Image, ERNIE, and FIBO families. Wan 2.2 TI2V for video. Published in the AbstractFramework HF collection.
Full HuggingFace Diffusers pipeline support. Stable Diffusion 1.x/XL/3.x, FLUX.2, Qwen Image Edit, and more. Device auto-detection (CUDA, MPS, CPU) with configurable precision.
GGUF diffusion models via sd-cli or python bindings. Auto-installs sd-cli on first use. Curated FLUX/Qwen GGUF bundles with automatic VAE/text-encoder companion resolution.
Generated outputs return lightweight JSON artifact refs ({"$artifact": ...}) that travel across processes. LocalAssetStore for direct storage, RuntimeArtifactStoreAdapter for AbstractRuntime integration.
Get generating in minutes. The base install is lightweight; local inference runtimes are explicit extras.
# Base install (OpenAI-compatible HTTP backend)
pip install abstractvision
# Apple Silicon local (MLX-Gen + Diffusers + sd.cpp)
pip install "abstractvision[all-apple]"
# GPU local (Diffusers + sd.cpp)
pip install "abstractvision[all-gpu]"
# Individual backends
pip install "abstractvision[mlx-gen]" # Apple Silicon MLX-Gen
pip install "abstractvision[diffusers]" # HuggingFace Diffusers
pip install "abstractvision[sdcpp]" # stable-diffusion.cpp
# Browse available presets
abstractvision catalog --provider mlx-gen
# Download curated MLX q4 presets
abstractvision download AbstractFramework/flux.2-klein-4b-4bit --provider mlx-gen
abstractvision download AbstractFramework/qwen-image-edit-2511-4bit --provider mlx-gen
# Download FIBO for image editing
abstractvision download briaai/Fibo-Edit --provider mlx-gen
# Download Wan 2.2 for video generation
abstractvision download Wan-AI/Wan2.2-TI2V-5B-Diffusers --provider mlx-gen
from abstractvision import VisionManager, LocalAssetStore
from abstractvision.backends import (
OpenAICompatibleBackendConfig,
OpenAICompatibleVisionBackend,
)
# Configure backend
backend = OpenAICompatibleVisionBackend(
config=OpenAICompatibleBackendConfig(
base_url="http://localhost:1234/v1",
)
)
# Create manager with artifact storage
vm = VisionManager(backend=backend, store=LocalAssetStore())
# Generate an image (returns artifact ref)
result = vm.generate_image("a cinematic photo of a red fox in snow")
print(result) # {"$artifact": "...", "content_type": "image/png"}
# One-shot text-to-image (stores result, prints artifact ref)
abstractvision t2i "a studio photo of an espresso machine" --open
# One-shot image-to-image edit
abstractvision i2i --image ./input.png "make it watercolor" --open
# Local MLX-Gen text-to-video
abstractvision t2v --provider mlx-gen --model Wan-AI/Wan2.2-TI2V-5B-Diffusers \
"a red fox walking through snow" --frames 121 --fps 24 --steps 50 --open
# Interactive CLI/REPL
abstractvision cli
# Local web playground
abstractvision playground --port 8091
The public API surface is built around VisionManager for direct usage, the capability registry for model discovery, and the AbstractCore plugin for ecosystem integration.
from abstractvision import VisionManager, LocalAssetStore
vm = VisionManager(backend=backend, store=LocalAssetStore())
# Text-to-Image
result = vm.generate_image("a watercolor lighthouse")
# Image-to-Image
edit = vm.edit_image("make it sunset", image="photo.png")
# Text-to-Video
video = vm.generate_video("a fox in the snow")
# Image-to-Video
clip = vm.image_to_video("slow push-in", image="frame.png")
from abstractvision import VisionModelCapabilitiesRegistry
reg = VisionModelCapabilitiesRegistry()
# Query model capabilities
reg.supports("runwayml/stable-diffusion-v1-5", "text_to_image") # True
reg.supports("Qwen/Qwen-Image-Edit-2511", "image_to_image") # True
# List tasks and models
reg.list_tasks()
reg.models_for_task("text_to_image")
reg.models_for_task("image_to_image")
from abstractcore import create_llm
llm = create_llm("openai")
# Generate via capability plugin
img = llm.vision.t2i("a studio product photo")
# Provider catalog discovery
models = llm.vision.list_provider_models(provider="openai")
# Tool integration for agent workflows
from abstractvision.integrations.abstractcore import make_vision_tools
tools = make_vision_tools(vision_manager=vm)
Default backend. Remote HTTP via /v1/images/generations. Works with OpenAI, local servers, and any compatible endpoint. Supports T2I, I2I, and optional T2V/I2V.
Apple Silicon q4/q8 optimized presets. FLUX.2, Qwen, Z-Image, ERNIE, FIBO families for images. Wan 2.2 TI2V for video. Cache-only defaults; downloads are explicit.
Full HuggingFace pipeline support. SD 1.x/XL/3.x, FLUX.2, Qwen Image Edit, ERNIE Image. Auto device selection (CUDA, MPS, CPU). Configurable precision and steps.
GGUF diffusion via sd-cli or python bindings. Auto-installs binary on first use. Curated FLUX/Qwen GGUF bundles with companion resolution. Metal and CUDA acceleration.