Skip to main content
Writing for voice agents is fundamentally different from writing for text chatbots. In a voice conversation, the user cannot “scan” the text; they must listen linearly. Long, complex sentences increase cognitive load and latency.

1. Be Concise

Bad:
“Thank you for calling Acme Corp, the leading provider of widgets in the tri-state area. I understand you are calling about an order status. I can certainly help you with that today. Could you please provide your order number?”
Why it fails:
  • Takes 10 seconds to say.
  • The user forgets the question by the end.
  • Increases latency (more tokens to generate).
Good:
“Thanks for calling Acme. I can help with your order. What’s the order number?”
Tip: Add Keep your responses short and conversational, ideally under 20 words. to your system prompt.

2. Avoid Special Characters

LLMs sometimes output markdown, lists, or URLs. These sound terrible when read by TTS. Instruction:
“Do not use markdown, bullet points, or emojis. Do not read out URLs; instead say ‘check our website’.“

3. Handle Interruptions

Users will interrupt. Design your prompt to handle context switching gracefully. Instruction:
“If the user interrupts or changes the topic, stop what you are saying and address their new point immediately.”

4. Pacing and Fillers

To make the agent sound more human, you can instruct it to use fillers, but use them sparingly. Instruction:
“Use natural fillers like ‘hmm’ or ‘let me check’ only when retrieving data via a tool.”

5. Phone Number Formatting

TTS engines sometimes read phone numbers as “one million…” instead of “one, zero, zero…”. Instruction:
“When speaking phone numbers, say them digit by digit, grouping them for clarity (e.g., 555-123-4567).”