API access will be available shortly.
Uttera is a self-hosted, OpenAI-compatible voice stack — text-to-speech, speech-to-text, voice cloning, and translation — designed to run on your own GPU with no data sent to third parties.
While we finish the managed cloud, the entire stack is open source and usable today. Benchmarks, servers, protocol, and Dockerfiles are all on GitHub.
uttera-tts-vllm
High-throughput TTS · nano-vLLM + VoxCPM2
uttera-tts-hotcold
Multi-backend TTS · Coqui XTTS-v2 · VoxCPM2
uttera-stt-vllm
High-throughput STT · vLLM + Whisper-v3-turbo
uttera-stt-hotcold
Multi-model STT · openai-whisper + LibreTranslate
uttera-benchmarks
Protocol, harness, and published results