CAPABILITY PLUGIN

AbstractVision

Model-agnostic generative vision API for images and video. Text-to-image, image-to-image, text-to-video, and image-to-video with backends for MLX-Gen, Diffusers, stable-diffusion.cpp, and OpenAI-compatible HTTP.

from abstractvision import VisionManager

vm = VisionManager(backend=backend, store=LocalAssetStore())

# Text to Image
img = vm.generate_image("a red fox in snow")

# Image to Image
edit = vm.edit_image("make it watercolor", image="input.png")

Generative Vision for the Abstract Ecosystem

AbstractVision provides a model-agnostic generative vision API covering images and video. It ships a capability registry, artifact-ref outputs, and backends for remote and local inference — from cloud APIs to Apple Silicon MLX quantized models.

VisionManager Orchestrator

Thin orchestration layer that delegates to pluggable backends. Supports text_to_image, image_to_image, text_to_video, and image_to_video with optional artifact-ref outputs for cross-process portability.

Capability Registry

Packaged vision_model_capabilities.json declares what each model supports. Query tasks, models, and capabilities programmatically. Curated entries with license and precision metadata per ADR 0005.

AbstractCore Plugin

Discovered automatically via entry points. Exposes llm.vision.t2i(...), llm.vision.i2i(...), provider catalog discovery, and model residency control for local backends through the unified capability contract.

Four Backends, Five Generation Modes

Generate images and video with the backend that fits your hardware and workflow. Every backend normalizes parameters through the shared model capability registry.

Text-to-Image

Generate images from text prompts across all four backends. Supports Stable Diffusion, FLUX.2, Qwen Image, ERNIE, FIBO, and OpenAI models with configurable steps, guidance scale, and resolution.

Image-to-Image

Edit and transform existing images. Supported across MLX-Gen (FLUX.2 klein/base, Qwen Image Edit, FIBO Edit), Diffusers, and OpenAI-compatible backends. Mask inputs supported where the runtime allows.

Text-to-Video

Local video generation via MLX-Gen Wan 2.2 TI2V on Apple Silicon. Remote video via OpenAI-compatible HTTP endpoints. Frame/step progress reporting in shell and Python callbacks.

Image-to-Video

Animate a first-frame image into video. MLX-Gen Wan 2.2 TI2V provides local first-frame to video generation. Remote endpoints available through the OpenAI-compatible backend.

MLX-Gen Backend

Apple Silicon-optimized q4/q8 quantized model presets. FLUX.2, Qwen Image, Z-Image, ERNIE, and FIBO families. Wan 2.2 TI2V for video. Published in the AbstractFramework HF collection.

Diffusers Backend

Full HuggingFace Diffusers pipeline support. Stable Diffusion 1.x/XL/3.x, FLUX.2, Qwen Image Edit, and more. Device auto-detection (CUDA, MPS, CPU) with configurable precision.

sd.cpp Backend

GGUF diffusion models via sd-cli or python bindings. Auto-installs sd-cli on first use. Curated FLUX/Qwen GGUF bundles with automatic VAE/text-encoder companion resolution.

Artifact Refs

Generated outputs return lightweight JSON artifact refs ({"$artifact": ...}) that travel across processes. LocalAssetStore for direct storage, RuntimeArtifactStoreAdapter for AbstractRuntime integration.

Install & First Image

Get generating in minutes. The base install is lightweight; local inference runtimes are explicit extras.

Installation

# Base install (OpenAI-compatible HTTP backend)
pip install abstractvision

# Apple Silicon local (MLX-Gen + Diffusers + sd.cpp)
pip install "abstractvision[all-apple]"

# GPU local (Diffusers + sd.cpp)
pip install "abstractvision[all-gpu]"

# Individual backends
pip install "abstractvision[mlx-gen]"   # Apple Silicon MLX-Gen
pip install "abstractvision[diffusers]"  # HuggingFace Diffusers
pip install "abstractvision[sdcpp]"      # stable-diffusion.cpp

Download Models

# Browse available presets
abstractvision catalog --provider mlx-gen

# Download curated MLX q4 presets
abstractvision download AbstractFramework/flux.2-klein-4b-4bit --provider mlx-gen
abstractvision download AbstractFramework/qwen-image-edit-2511-4bit --provider mlx-gen

# Download FIBO for image editing
abstractvision download briaai/Fibo-Edit --provider mlx-gen

# Download Wan 2.2 for video generation
abstractvision download Wan-AI/Wan2.2-TI2V-5B-Diffusers --provider mlx-gen

Quick Start (Python)

from abstractvision import VisionManager, LocalAssetStore
from abstractvision.backends import (
    OpenAICompatibleBackendConfig,
    OpenAICompatibleVisionBackend,
)

# Configure backend
backend = OpenAICompatibleVisionBackend(
    config=OpenAICompatibleBackendConfig(
        base_url="http://localhost:1234/v1",
    )
)

# Create manager with artifact storage
vm = VisionManager(backend=backend, store=LocalAssetStore())

# Generate an image (returns artifact ref)
result = vm.generate_image("a cinematic photo of a red fox in snow")
print(result)  # {"$artifact": "...", "content_type": "image/png"}

CLI Commands

# One-shot text-to-image (stores result, prints artifact ref)
abstractvision t2i "a studio photo of an espresso machine" --open

# One-shot image-to-image edit
abstractvision i2i --image ./input.png "make it watercolor" --open

# Local MLX-Gen text-to-video
abstractvision t2v --provider mlx-gen --model Wan-AI/Wan2.2-TI2V-5B-Diffusers \
  "a red fox walking through snow" --frames 121 --fps 24 --steps 50 --open

# Interactive CLI/REPL
abstractvision cli

# Local web playground
abstractvision playground --port 8091

Key Classes & Methods

The public API surface is built around VisionManager for direct usage, the capability registry for model discovery, and the AbstractCore plugin for ecosystem integration.

VisionManager — Orchestrator

from abstractvision import VisionManager, LocalAssetStore

vm = VisionManager(backend=backend, store=LocalAssetStore())

# Text-to-Image
result = vm.generate_image("a watercolor lighthouse")

# Image-to-Image
edit = vm.edit_image("make it sunset", image="photo.png")

# Text-to-Video
video = vm.generate_video("a fox in the snow")

# Image-to-Video
clip = vm.image_to_video("slow push-in", image="frame.png")

Capability Registry

from abstractvision import VisionModelCapabilitiesRegistry

reg = VisionModelCapabilitiesRegistry()

# Query model capabilities
reg.supports("runwayml/stable-diffusion-v1-5", "text_to_image")  # True
reg.supports("Qwen/Qwen-Image-Edit-2511", "image_to_image")  # True

# List tasks and models
reg.list_tasks()
reg.models_for_task("text_to_image")
reg.models_for_task("image_to_image")

AbstractCore Plugin Integration

from abstractcore import create_llm

llm = create_llm("openai")

# Generate via capability plugin
img = llm.vision.t2i("a studio product photo")

# Provider catalog discovery
models = llm.vision.list_provider_models(provider="openai")

# Tool integration for agent workflows
from abstractvision.integrations.abstractcore import make_vision_tools
tools = make_vision_tools(vision_manager=vm)

Backend Support Matrix

OpenAI-Compatible

Default backend. Remote HTTP via /v1/images/generations. Works with OpenAI, local servers, and any compatible endpoint. Supports T2I, I2I, and optional T2V/I2V.

MLX-Gen

Apple Silicon q4/q8 optimized presets. FLUX.2, Qwen, Z-Image, ERNIE, FIBO families for images. Wan 2.2 TI2V for video. Cache-only defaults; downloads are explicit.

Diffusers

Full HuggingFace pipeline support. SD 1.x/XL/3.x, FLUX.2, Qwen Image Edit, ERNIE Image. Auto device selection (CUDA, MPS, CPU). Configurable precision and steps.

sd.cpp (GGUF)

GGUF diffusion via sd-cli or python bindings. Auto-installs binary on first use. Curated FLUX/Qwen GGUF bundles with companion resolution. Metal and CUDA acceleration.