Skip to main content
Agents are the core entity in Butter AI. An agent defines who is speaking, how they speak, and what they know. You can sculpt your perfect agent by mixing and matching providers.

Configuration Components

An agent configuration consists of several key layers:

1. Transcriber (STT)

Converts user speech to text.
  • Providers: Deepgram Nova-2 (Popular), Deepgram Base (Budget), AssemblyAI Nano.
  • Selection: Deepgram Nova-2 is generally recommended for the lowest latency and highest accuracy.

2. Intelligence (LLM)

The “brain” that decides what to say.
  • Providers: OpenAI (GPT-4o, GPT-4o Mini), Anthropic (Claude 3.5 Sonnet, Haiku), Google (Gemini 1.5 Flash), Groq (Llama 3).
  • System Prompt: This is the most critical part. It defines the persona, guardrails, and instructions.
  • Temperature: Controls creativity. Lower (0.3) is better for strict tasks; higher (0.7-0.9) is better for casual conversation.

3. Synthesizer (TTS)

Converts text back to audio.
  • Providers: ElevenLabs Turbo v2 (Highest Quality), Cartesia Sonic (Fastest), Deepgram Aura (Budget), OpenAI TTS.
  • Voice ID: The specific voice persona to use (e.g., “Adam” from ElevenLabs, “Alloy” from OpenAI).

Capabilities

Speak First

If speak_first is set to true, the agent will initiate the conversation immediately upon connection (e.g., “Hello, thanks for calling Acme Corp. How can I help?”). If false, it waits for the user to speak.

Voicemail Detection

If enabled, the system uses Answering Machine Detection (AMD) to determine if a human or voicemail picked up.
  • Voicemail Message: Specific message to leave if a machine is detected.
  • Delay: Wait time before speaking to ensure the beep has passed.

Recording

Call recording can be toggled on/off. Recordings are stored in S3 and accessible via the API.

Tools and Knowledge

Agents can be augmented with:
  • Knowledge Base: Passive knowledge (RAG) powered by Pinecone.
  • Tools: Active capabilities (API calls).