Skip to main content
Agents are the core entity in Butter AI. An agent defines who is speaking, how they speak, and what they know. You can sculpt your perfect agent by mixing and matching providers.

Configuration Components

An agent configuration consists of several key layers:

1. Transcriber (STT)

Converts user speech to text.
  • Providers: Deepgram (Flux, Nova 3, Nova 2), ElevenLabs (Realtime Scribe v2, Scribe v1), Cartesia Ink Whisper, AWS Transcribe.
  • Selection: Deepgram Nova 3 and Flux are generally recommended for the lowest latency and highest accuracy.

2. Intelligence (LLM)

The “brain” that decides what to say.
  • Providers: OpenAI, Groq, Google Gemini, Cerebras, Amazon Bedrock.
  • Models: Options range from Gemini 2.5 Flash Lite for high speed to Claude Sonnet 4 and Llama 4 Maverick for advanced reasoning.
  • Fallback Models: You can configure a fallback LLM provider and model (e.g., llm_fallback_provider and llm_fallback_model) to ensure high availability if the primary provider fails.
  • System Prompt: This is the most critical part. It defines the persona, guardrails, and instructions.
  • Temperature: Controls creativity. Lower (0.3) is better for strict tasks; higher (0.7-0.9) is better for casual conversation.

3. Synthesizer (TTS)

Converts text back to audio.
  • Providers: ElevenLabs, Cartesia, Amazon Polly (Neural and Generative).
  • Voice ID: The specific voice persona to use (e.g., “Clyde” from ElevenLabs, “Brooke” from Cartesia).

4. Language

Sets the primary language for the agent (e.g., English), dictating which models and voices apply natively to interactions.

Capabilities

Speak First

If speak_first is set to true, the agent will initiate the conversation immediately upon connection. If false, it waits for the user to speak.

End Call Tool

Agents can be permitted to programmatically hang up the call when tasks are complete by enabling the end_call_tool flag.

Voicemail Detection

If enabled, the system uses Answering Machine Detection (AMD) to determine if a human or voicemail picked up.
  • Voicemail Message: Specific message to leave if a machine is detected.
  • Delay: Wait time before speaking to ensure the beep has passed.

Recording

Call recording can be toggled on/off. Recordings are stored in S3 and accessible via the API.

Tools and Knowledge

Agents can be augmented with:
  • Knowledge Base: Passive knowledge (RAG) powered by Pinecone. Attach valid kb_document_ids directly to the agent.
  • Tools: Active capabilities (API calls) attached via custom_tools.