Configuration Components
An agent configuration consists of several key layers:1. Transcriber (STT)
Converts user speech to text.- Providers: Deepgram (Flux, Nova 3, Nova 2), ElevenLabs (Realtime Scribe v2, Scribe v1), Cartesia Ink Whisper, AWS Transcribe.
- Selection: Deepgram Nova 3 and Flux are generally recommended for the lowest latency and highest accuracy.
2. Intelligence (LLM)
The “brain” that decides what to say.- Providers: OpenAI, Groq, Google Gemini, Cerebras, Amazon Bedrock.
- Models: Options range from Gemini 2.5 Flash Lite for high speed to Claude Sonnet 4 and Llama 4 Maverick for advanced reasoning.
- Fallback Models: You can configure a fallback LLM provider and model (e.g.,
llm_fallback_providerandllm_fallback_model) to ensure high availability if the primary provider fails. - System Prompt: This is the most critical part. It defines the persona, guardrails, and instructions.
- Temperature: Controls creativity. Lower (0.3) is better for strict tasks; higher (0.7-0.9) is better for casual conversation.
3. Synthesizer (TTS)
Converts text back to audio.- Providers: ElevenLabs, Cartesia, Amazon Polly (Neural and Generative).
- Voice ID: The specific voice persona to use (e.g., “Clyde” from ElevenLabs, “Brooke” from Cartesia).
4. Language
Sets the primary language for the agent (e.g., English), dictating which models and voices apply natively to interactions.Capabilities
Speak First
Ifspeak_first is set to true, the agent will initiate the conversation immediately upon connection. If false, it waits for the user to speak.
End Call Tool
Agents can be permitted to programmatically hang up the call when tasks are complete by enabling theend_call_tool flag.
Voicemail Detection
If enabled, the system uses Answering Machine Detection (AMD) to determine if a human or voicemail picked up.- Voicemail Message: Specific message to leave if a machine is detected.
- Delay: Wait time before speaking to ensure the beep has passed.
Recording
Call recording can be toggled on/off. Recordings are stored in S3 and accessible via the API.Tools and Knowledge
Agents can be augmented with:- Knowledge Base: Passive knowledge (RAG) powered by Pinecone. Attach valid
kb_document_idsdirectly to the agent. - Tools: Active capabilities (API calls) attached via
custom_tools.