Latency Optimization

Latency is the “killer” of voice interfaces. A delay of more than 1 second feels unnatural. Butter AI is optimized for sub-second latency, but your configuration choices impact this significantly.

The Latency Budget

Total Latency = Network + STT + LLM + TTS + Network You have control over the STT, LLM, and TTS choices.

1. Provider Selection

STT (Speech to Text)

Deepgram Nova-3 / Flux: Extremely fast and accurate (recommended).
ElevenLabs Realtime Scribe v2: High quality transcription with low latency.
Cartesia Ink Whisper: Optimized for fast conversational turns.
AWS Transcribe: Reliable option for legacy integrations.

LLM (Language Model)

Google Gemini 2.5 Flash Lite: Optimized for speed and cost-efficiency.
Groq (Llama 4 Maverick): Ultra-low latency inference for simple tasks.
OpenAI GPT-4o: State-of-the-art balance of intelligence and speed.
Anthropic Claude 3.5 Haiku: Fast and capable reasoning.

TTS (Text to Speech)

Cartesia: Designed specifically for sub-200ms real-time agents.
ElevenLabs Turbo v2: Best-in-class quality with sub-300ms latency.
Amazon Polly Generative: Consistent performance and quality.

2. Server Location

Ensure your SIP trunking and user base are geographically close to the server region you selected when creating your organization. Cross-Atlantic roundtrips add 100ms+ of pure network latency.

3. Tool Performance

If your agent calls a custom tool (e.g., check_inventory), the agent cannot speak until your API responds.

Optimize your API: Ensure your endpoints respond in under 200ms.
Async Handoff: If a task takes long (e.g., 5s), don’t make the user wait in silence.
- Bad: (Silence for 5s) -> “Done.”
- Good: “I’ll check that for you, it might take a moment…” -> (Tool Call) -> “Okay, I found it.”

4. Prompt Engineering

A long system prompt takes longer to process (input token latency). Keep your prompt focused. Remove unused instructions or examples.

Getting Started

Core Concepts

Telephony

Cookbook

Best Practices

Resources

Advanced

Latency Optimization

The Latency Budget

1. Provider Selection

STT (Speech to Text)

LLM (Language Model)

TTS (Text to Speech)

2. Server Location

3. Tool Performance

4. Prompt Engineering

Getting Started

Core Concepts

Telephony

Cookbook

Best Practices

Resources

Advanced

​The Latency Budget

​1. Provider Selection

​STT (Speech to Text)

​LLM (Language Model)

​TTS (Text to Speech)

​2. Server Location

​3. Tool Performance

​4. Prompt Engineering

The Latency Budget

1. Provider Selection

STT (Speech to Text)

LLM (Language Model)

TTS (Text to Speech)

2. Server Location

3. Tool Performance

4. Prompt Engineering