Skip to main content
Latency is the “killer” of voice interfaces. A delay of more than 1 second feels unnatural. Butter AI is optimized for sub-second latency, but your configuration choices impact this significantly.

The Latency Budget

Total Latency = Network + STT + LLM + TTS + Network You have control over the STT, LLM, and TTS choices.

1. Provider Selection

STT (Speech to Text)

  • Deepgram Nova-3 / Flux: Extremely fast and accurate (recommended).
  • ElevenLabs Realtime Scribe v2: High quality transcription with low latency.
  • Cartesia Ink Whisper: Optimized for fast conversational turns.
  • AWS Transcribe: Reliable option for legacy integrations.

LLM (Language Model)

  • Google Gemini 2.5 Flash Lite: Optimized for speed and cost-efficiency.
  • Groq (Llama 4 Maverick): Ultra-low latency inference for simple tasks.
  • OpenAI GPT-4o: State-of-the-art balance of intelligence and speed.
  • Anthropic Claude 3.5 Haiku: Fast and capable reasoning.

TTS (Text to Speech)

  • Cartesia: Designed specifically for sub-200ms real-time agents.
  • ElevenLabs Turbo v2: Best-in-class quality with sub-300ms latency.
  • Amazon Polly Generative: Consistent performance and quality.

2. Server Location

Ensure your SIP trunking and user base are geographically close to the server region you selected when creating your organization. Cross-Atlantic roundtrips add 100ms+ of pure network latency.

3. Tool Performance

If your agent calls a custom tool (e.g., check_inventory), the agent cannot speak until your API responds.
  • Optimize your API: Ensure your endpoints respond in under 200ms.
  • Async Handoff: If a task takes long (e.g., 5s), don’t make the user wait in silence.
    • Bad: (Silence for 5s) -> “Done.”
    • Good: “I’ll check that for you, it might take a moment…” -> (Tool Call) -> “Okay, I found it.”

4. Prompt Engineering

A long system prompt takes longer to process (input token latency). Keep your prompt focused. Remove unused instructions or examples.