Model-agnostic text-to-music and text-to-audio generation for the Abstract ecosystem. Remote-first with ACE Music and ElevenLabs, local generation via ACE-Step and Stable Audio 3 on Apple Silicon and GPU.
from abstractmusic import MusicManager
mm = MusicManager(backend="acemusic")
# Generate music from a text prompt
audio = mm.t2m("ambient lo-fi study music", duration=30)
# Save to file
open("out.wav", "wb").write(audio)
AbstractMusic is a model-agnostic text-to-music and text-to-audio library. The base install is import-light with remote backends (ACE Music, ElevenLabs). Local inference stacks for ACE-Step, Stable Audio 3, and MusicGen live behind optional extras.
The acemusic backend calls a hosted API out of the box. The elevenlabs backend targets ElevenLabs Music endpoints. Both are stdlib-only with no heavy dependencies in the base install.
The acestep backend uses AbstractMusic-owned orchestration around Diffusers AceStepPipeline with HuggingFace model weights. Runs locally on Apple Silicon (MPS) and CUDA GPUs, with automatic CPU fallback.
Separates text planning from audio synthesis. The built-in deterministic planner is dependency-free. Host applications can inject smarter planners through MusicManager or plugin configuration for prompt enhancement and lyrics generation.
All samples below were generated locally on Apple Silicon using ACE-Step 1.5 through the AbstractMusic API. 15-second clips, text prompt only.
Ethereal ambient soundscape
Orchestral brass, soaring strings
Smooth saxophone, walking bass
Upbeat pop with synths
Dreamy lo-fi hip hop, vinyl crackle
Aggressive dubstep, glitchy synths
Generated via: abstractmusic --backend stable-audio-3 t2m "prompt" --duration 15
Generate music from text prompts using remote APIs or local models. The manager stays thin and model-agnostic while backends handle provider-specific behavior.
Default remote backend. Calls a hosted ACE Music API with an API key. Lightweight stdlib-only client. Supports prompt, lyrics, duration, BPM, seed, and format parameters.
Stdlib-only remote backend for ElevenLabs Music endpoints. Supports provider-neutral MusicCompositionPlan for structured music requests. Requires ELEVENLABS_API_KEY.
Local generation via Diffusers AceStepPipeline. MIT-licensed XL Turbo checkpoint. Supports lyrics, vocal language, BPM, and seed. MPS bfloat16 preferred on Apple Silicon with automatic fallbacks.
Internal text-to-music path for stabilityai/stable-audio-3-small-music. AbstractMusic-owned runtime code with HuggingFace weights. Small Music validated up to 120 seconds.
ACE-Step prefers PyTorch MPS with bfloat16 when supported, then float32, with CPU float32 as the final fallback. Automatic MPS memory capping and watermark validation for 18 GB+ machines.
Optional --enhance-prompt and --auto-lyrics flags. Structure prompting enabled by default for 45+ second generations. Planner results include a provider-neutral composition_plan.
Artifact screening with WAV/music-likeness inspection and repetition/novelty quality gates. Validation state is explicit through smoke metrics, tests, and registry status.
Packaged music_model_capabilities.json declares model metadata, license, and precision policy. Discovery via available_providers(), list_models(), and capability_catalog() without loading weights.
Generate music in minutes. The base install is lightweight with remote backends; local inference is an explicit extra.
# Base install (remote ACE Music + ElevenLabs backends)
pip install abstractmusic
# Local ACE-Step generation
pip install "abstractmusic[acestep]"
# Local Stable Audio 3
pip install "abstractmusic[stable-audio-3]"
# Full local stack (Apple Silicon)
pip install "abstractmusic[all-apple]"
# Full local stack (GPU)
pip install "abstractmusic[all-gpu]"
# Remote generation (default ACE Music backend)
abstractmusic t2m "ambient lo-fi study music" --out out.wav --duration 30
# ElevenLabs backend
abstractmusic --backend elevenlabs t2m "cinematic instrumental synth cue" \
--format mp3 --out out.mp3 --duration 30
# Local ACE-Step generation
abstractmusic --backend acestep t2m "ambient lo-fi study music" \
--out out.wav --duration 10
# Enhanced prompt with auto-lyrics
abstractmusic --backend acestep t2m "heroic fantasy epic music" \
--enhance-prompt --auto-lyrics --print-plan --out out.wav --duration 30
# Local Stable Audio 3
abstractmusic --backend stable-audio-3 t2m "rhythmic space shooter game music" \
--out out.wav --duration 30 --steps 16
from abstractmusic import MusicManager
# Remote generation (reads ACEMUSIC_API_KEY from env)
mm = MusicManager(backend="acemusic")
audio = mm.t2m("ambient lo-fi study music", duration=30)
# Local ACE-Step generation
mm_local = MusicManager(backend="acestep")
audio = mm_local.t2m(
"epic orchestral film score",
duration=30,
seed=42,
)
# Artifact-based generation
asset = mm.generate_audio(
prompt="jazz piano trio",
duration=30,
format="wav",
)
# Start the interactive REPL
abstractmusic cli
# Inside the REPL:
# /engines — list available backends
# /models — list known models by engine
# /engine acestep — switch to local ACE-Step
# /model ACE-Step/acestep-v15-xl-turbo-diffusers
# /status — show active engine, model, params
# /download on — enable HuggingFace downloads
The public API is built around MusicManager for direct usage and the AbstractCore capability plugin for ecosystem integration.
from abstractmusic import MusicManager
mm = MusicManager(backend="acemusic")
# Generate audio bytes directly
audio_bytes = mm.t2m(
prompt="ambient lo-fi study music",
duration=30,
format="wav",
seed=42,
)
# Generate with artifact metadata
asset = mm.generate_audio(
prompt="cinematic orchestral cue",
duration=60,
lyrics="Rise above the shadows...",
vocal_language="en",
bpm=120,
)
mm = MusicManager(
backend="acestep",
text_planner_mode="auto", # "auto" | "required" | "off"
)
# Auto mode: uses injected provider when present,
# falls back to the deterministic planner
audio = mm.t2m(
"epic battle music with choir",
duration=45, # structure-prompt enabled by default at 45s+
)
from abstractcore import create_llm
llm = create_llm("openai")
# Generate music via capability plugin
audio = llm.music.t2m("chill ambient study beats", duration=30)
# Provider and model discovery (no weight loading)
providers = llm.music.available_providers()
models = llm.music.list_models(provider="acestep")
catalog = llm.music.capability_catalog()
# Local engine residency (acestep, stable-audio-3)
llm.music.load_resident_model(provider="acestep")
llm.music.list_resident_models()
llm.music.unload_resident_model(provider="acestep")
Default remote backend. Hosted API with ACEMUSIC_API_KEY. Stdlib-only client. Supports tagged <prompt> mode for strict duration enforcement.
Stdlib-only remote backend for ElevenLabs Music endpoints. Supports MusicCompositionPlan for structured requests. Requires ELEVENLABS_API_KEY.
Local generation via AceStepPipeline. MIT XL Turbo checkpoint. Lyrics, vocal language, BPM, seed. MPS bfloat16 → float32 → CPU fallback chain.
Internal stable-audio-3-small-music runtime. HuggingFace weights only, no upstream package. Validated at 30s and 120s. Requires HF gate approval.