Core Runtime Agent Flow Gateway Code Observer Assistant Voice Vision Music Memory Semantics
CAPABILITY PLUGIN

AbstractVoice

Modular Python voice I/O for AI applications. Text-to-speech, speech-to-text, and voice cloning with local-first defaults and remote provider support. One interface across Piper, Supertonic, OpenAI, OmniVoice, and more.

from abstractvoice import VoiceManager

vm = VoiceManager(language="en")

# Speak aloud
vm.speak("Hello from AbstractVoice.")

# Get audio bytes
wav = vm.speak_to_bytes("Headless TTS.", format="wav")

Voice I/O for the Abstract Ecosystem

AbstractVoice is a modular voice I/O library providing text-to-speech (TTS), speech-to-text (STT), and optional voice cloning. It integrates with AbstractCore as a capability plugin, enabling any AI application in the ecosystem to speak and listen.

Remote-First Default

The base install uses OpenAI-compatible TTS and STT by default. Point at any compatible endpoint via OPENAI_BASE_URL or pass remote_base_url directly. No GPU or local model required to get started.

Local Inference Stack

Install abstractvoice[apple] or [gpu] for the full local experience: Supertonic ONNX TTS, Piper TTS, faster-whisper STT, VAD, microphone capture, and local voice cloning engines — all running in-process.

AbstractCore Plugin

Discovered automatically via entry points when installed alongside AbstractCore. Exposes provider/model/voice discovery, TTS/STT execution, and voice cloning through the unified capability contract.

Generated with AbstractVoice

Voice samples generated using the AbstractVoice TTS API. Same interface regardless of the underlying model or provider.

Supertonic 3 — open-source, local ONNX inference

F1 (Female)

Open-source female voice — welcome message

M1 (Male)

Open-source male voice — durability & continuity

M3 (Male)

Open-source male voice — multimodal capabilities

Generated via: abstractvoice cli then /tts engine supertonic — or programmatically with SupertonicTTSAdapter

OpenAI TTS — remote, via any OpenAI-compatible endpoint

Nova (Female)

Warm, expressive female voice — welcome message

Onyx (Male, Deep)

Deep, authoritative male voice — ecosystem overview

Alloy (Neutral)

Balanced, neutral voice — durability & observability

Generated via: abstractvoice tts "text" --voice nova

Complete Voice Pipeline

From text-to-speech synthesis to real-time speech recognition and voice cloning, AbstractVoice covers the full voice I/O stack.

Text-to-Speech

Multiple TTS engines: Supertonic 3 ONNX (local, 10 voices), Piper (local, multilingual), OpenAI-compatible HTTP (remote), AudioDiT, and OmniVoice. Buffered or streamed delivery with audio chunk smoothing.

Speech-to-Text

Local STT via faster-whisper with model size selection (tiny to large). Remote STT via OpenAI-compatible endpoints. Voice Activity Detection (VAD) with webrtcvad for accurate speech boundary detection.

Voice Cloning

Clone voices from reference audio using F5-TTS, Chroma-4B, AudioDiT, or OmniVoice engines. Clones are stored locally with bundles for portability. Reference text auto-fallback via ADR 0003.

Voice Profiles

Cross-engine voice profile abstraction with shipped presets per engine. Select voices by language, gender, or profile ID. Runtime TTS switching resets to provider/language defaults automatically.

Streaming Pipeline

LLM-to-TTS streaming bridge via TextToSpeechStream. Incremental text chunking with sentence and soft-boundary segmentation. Audio chunk fading to eliminate clicks at boundaries.

Multilingual

Language-aware voice selection across engines. Piper supports 20+ languages with dedicated voice packs. OmniVoice provides omnilingual TTS with zero-shot cloning across languages.

Offline-First

REPL runs with allow_downloads=False. Downloads are explicit via abstractvoice-prefetch. Once models are cached, everything works without network access.

Echo Cancellation

Optional acoustic echo cancellation (AEC) for true barge-in support. Stop-phrase detection for voice-controlled interruption. Configurable voice mode callbacks for speaking behavior.

Install & First Words

Get up and running with AbstractVoice in minutes. Choose between remote-first (default) or full local inference.

Installation

# Base install (remote OpenAI-compatible TTS/STT)
pip install abstractvoice

# Full local stack for macOS Apple Silicon
pip install "abstractvoice[apple]"

# Full local stack for GPU (Linux/Windows)
pip install "abstractvoice[gpu]"

# Granular extras
pip install "abstractvoice[supertonic,stt,audio-io]"

Prefetch Models (Offline-Friendly)

# Download local TTS models
abstractvoice-prefetch --supertonic
abstractvoice-prefetch --piper en

# Download local STT model
abstractvoice-prefetch --stt small

# Optional cloning backends
abstractvoice-prefetch --omnivoice
abstractvoice-prefetch --openf5

Quick Start (Python)

from abstractvoice import VoiceManager

# Remote TTS (reads OPENAI_API_KEY from env)
vm = VoiceManager(language="en")
vm.speak("Hello from AbstractVoice.")

# Get WAV bytes for headless usage
wav = vm.speak_to_bytes("Headless TTS output.", format="wav")

# Fully local TTS + STT
vm_local = VoiceManager(
    language="en",
    tts_engine="supertonic",
    stt_engine="faster_whisper",
)
vm_local.speak("Running entirely offline.")

CLI & REPL

# Interactive REPL (remote by default)
abstractvoice --verbose

# Fully local REPL
abstractvoice --tts-engine supertonic --stt-engine faster_whisper --verbose

# Local web UI (requires abstractvoice[web])
abstractvoice web --port 5000

Key Classes & Methods

The public API surface is centered on VoiceManager for direct usage and the AbstractCore capability plugin for ecosystem integration.

VoiceManager — Core Façade

from abstractvoice import VoiceManager

vm = VoiceManager(
    language="en",
    tts_engine="openai",       # or "supertonic", "piper", "omnivoice"
    stt_engine="openai",       # or "faster_whisper"
    remote_base_url="...",    # optional OpenAI-compatible endpoint
)

# TTS
vm.speak("Hello world")
wav = vm.speak_to_bytes("Hello", format="wav")

# Engine preload/unload (local engines only)
vm.preload_tts_engine(engine="supertonic")
vm.preload_stt_engine(engine="faster_whisper")
vm.unload_tts_engine()
vm.unload_stt_engine()

# Runtime TTS switching
vm.set_tts_engine("piper")

AbstractCore Plugin Integration

from abstractcore import create_llm

llm = create_llm("openai")

# TTS via capability plugin
wav = llm.voice.tts(
    "Hello from AbstractCore",
    provider="openai",
    model="tts-1",
    voice="alloy",
    format="wav",
)

# STT via capability plugin
text = llm.voice.stt(audio_bytes, provider="faster-whisper", model="small")

# Provider and voice discovery
providers = llm.voice.available_providers()
voices = llm.voice.list_tts_voices(provider="supertonic")
models = llm.voice.list_tts_models(provider="openai")

# Voice cloning
llm.voice.clone_voice(reference_audio="ref.wav", name="my-clone")

# Resident model management (local engines)
llm.voice.load_resident_model(provider="supertonic")
llm.voice.list_resident_models()
llm.voice.unload_resident_model(provider="supertonic")

Streaming TTS (LLM → Voice)

from abstractvoice.tts.text_to_speech_stream import TextToSpeechStream

# Bridge incremental LLM text to TTS audio
stream = TextToSpeechStream(tts_engine=vm.tts_engine)

# Feed tokens as they arrive from the LLM
for token in llm_stream:
    stream.feed(token)
stream.finish()

Available TTS Engines

OpenAI / Compatible

Remote HTTP TTS via /audio/speech. Default engine. Supports any OpenAI-compatible endpoint including local servers.

Supertonic 3

Local ONNX TTS with 10 fixed profiles (M1–M5, F1–F5). Fast, lightweight, no external SDK dependency. Recommended local base TTS.

Piper

Local multilingual TTS with downloadable voice packs. Supports 20+ languages. Voice selection by language, quality tier, and speaker ID.

OmniVoice

Omnilingual local TTS with zero-shot voice cloning. Recommended cloning backend. Heavy model — prefetch required.

AudioDiT

LongCat-AudioDiT TTS with prompt-audio cloning. MIT-licensed vendored model code. Optional heavy backend.