Skip to main contentLatency is the “killer” of voice interfaces. A delay of more than 1 second feels unnatural. Butter AI is optimized for sub-second latency, but your configuration choices impact this significantly.
The Latency Budget
Total Latency = Network + STT + LLM + TTS + Network
You have control over the STT, LLM, and TTS choices.
1. Provider Selection
STT (Speech to Text)
- Deepgram Nova-3 / Flux: Extremely fast and accurate (recommended).
- ElevenLabs Realtime Scribe v2: High quality transcription with low latency.
- Cartesia Ink Whisper: Optimized for fast conversational turns.
- AWS Transcribe: Reliable option for legacy integrations.
LLM (Language Model)
- Google Gemini 2.5 Flash Lite: Optimized for speed and cost-efficiency.
- Groq (Llama 4 Maverick): Ultra-low latency inference for simple tasks.
- OpenAI GPT-4o: State-of-the-art balance of intelligence and speed.
- Anthropic Claude 3.5 Haiku: Fast and capable reasoning.
TTS (Text to Speech)
- Cartesia: Designed specifically for sub-200ms real-time agents.
- ElevenLabs Turbo v2: Best-in-class quality with sub-300ms latency.
- Amazon Polly Generative: Consistent performance and quality.
2. Server Location
Ensure your SIP trunking and user base are geographically close to the server region you selected when creating your organization. Cross-Atlantic roundtrips add 100ms+ of pure network latency.
If your agent calls a custom tool (e.g., check_inventory), the agent cannot speak until your API responds.
- Optimize your API: Ensure your endpoints respond in under 200ms.
- Async Handoff: If a task takes long (e.g., 5s), don’t make the user wait in silence.
- Bad: (Silence for 5s) -> “Done.”
- Good: “I’ll check that for you, it might take a moment…” -> (Tool Call) -> “Okay, I found it.”
4. Prompt Engineering
A long system prompt takes longer to process (input token latency). Keep your prompt focused. Remove unused instructions or examples.