API access will be available shortly.

Uttera is a self-hosted, OpenAI-compatible voice stack — text-to-speech, speech-to-text, voice cloning, and translation — designed to run on your own GPU with no data sent to third parties.

While we finish the managed cloud, the entire stack is open source and usable today. Benchmarks, servers, protocol, and Dockerfiles are all on GitHub.

Visit the repositories →

uttera-tts-vllm High-throughput TTS · nano-vLLM + VoxCPM2 uttera-tts-hotcold Multi-backend TTS · Coqui XTTS-v2 · VoxCPM2 uttera-stt-vllm High-throughput STT · vLLM + Whisper-v3-turbo uttera-stt-hotcold Multi-model STT · openai-whisper + LibreTranslate uttera-benchmarks Protocol, harness, and published results