Skip to main contentAgents are the core entity in Butter AI. An agent defines who is speaking, how they speak, and what they know. You can sculpt your perfect agent by mixing and matching providers.
Configuration Components
An agent configuration consists of several key layers:
1. Transcriber (STT)
Converts user speech to text.
- Providers: Deepgram Nova-2 (Popular), Deepgram Base (Budget), AssemblyAI Nano.
- Selection: Deepgram Nova-2 is generally recommended for the lowest latency and highest accuracy.
2. Intelligence (LLM)
The “brain” that decides what to say.
- Providers: OpenAI (GPT-4o, GPT-4o Mini), Anthropic (Claude 3.5 Sonnet, Haiku), Google (Gemini 1.5 Flash), Groq (Llama 3).
- System Prompt: This is the most critical part. It defines the persona, guardrails, and instructions.
- Temperature: Controls creativity. Lower (0.3) is better for strict tasks; higher (0.7-0.9) is better for casual conversation.
3. Synthesizer (TTS)
Converts text back to audio.
- Providers: ElevenLabs Turbo v2 (Highest Quality), Cartesia Sonic (Fastest), Deepgram Aura (Budget), OpenAI TTS.
- Voice ID: The specific voice persona to use (e.g., “Adam” from ElevenLabs, “Alloy” from OpenAI).
Capabilities
Speak First
If speak_first is set to true, the agent will initiate the conversation immediately upon connection (e.g., “Hello, thanks for calling Acme Corp. How can I help?”). If false, it waits for the user to speak.
Voicemail Detection
If enabled, the system uses Answering Machine Detection (AMD) to determine if a human or voicemail picked up.
- Voicemail Message: Specific message to leave if a machine is detected.
- Delay: Wait time before speaking to ensure the beep has passed.
Recording
Call recording can be toggled on/off. Recordings are stored in S3 and accessible via the API.
Agents can be augmented with:
- Knowledge Base: Passive knowledge (RAG) powered by Pinecone.
- Tools: Active capabilities (API calls).