Advanced RAG

Retrieval Augmented Generation (RAG) for voice is more demanding than for text. The retrieved context must be high-quality and concise to keep latency low and responses natural.

1. Chunking Strategy

For voice agents, we recommend smaller chunk sizes (200-400 tokens). Large chunks increase the LLM’s “thinking” time (latency) and may include irrelevant information that makes the agent ramble.

Best Practices:

Overlapping: Use a 10-15% overlap between chunks to ensure context isn’t lost at the boundaries.
Semantic Splitting: If possible, split by paragraph or section rather than a strict character count.

2. Context Injection

In Butter AI, when a document is attached to an agent, relevant chunks are injected into the system prompt at runtime.

Prompting for RAG:

To prevent the agent from sounding like a robot reading a manual, add these instructions to your agent’s system prompt:

“Use the provided context to answer questions, but speak naturally. Do not say ‘According to the document…’ or ‘The context says…’. Just provide the answer directly and concisely.”

3. Optimizing for Speed

Every token of retrieved context adds to the LLM’s input processing time.

Top-K Selection: By default, Butter AI retrieves the top 3 most relevant chunks. For very fast agents, you might reduce this to the top 1 or 2 chunks.
Filtering: Ensure your documents are clean. Remove headers, footers, and page numbers from PDFs before uploading, as these create “noise” in the retrieval results.

4. Hybrid Search (Upcoming)

We are working on hybrid search (combining keyword and vector search) to improve the retrieval of specific proper nouns, SKU numbers, or technical codes that might be missed by semantic-only search.

Getting Started

Core Concepts

Telephony

Cookbook

Best Practices

Resources

Advanced

1. Chunking Strategy

Best Practices:

2. Context Injection

Prompting for RAG:

3. Optimizing for Speed

4. Hybrid Search (Upcoming)

Getting Started

Core Concepts

Telephony

Cookbook

Best Practices

Resources

Advanced

​1. Chunking Strategy

​Best Practices:

​2. Context Injection

​Prompting for RAG:

​3. Optimizing for Speed

​4. Hybrid Search (Upcoming)

1. Chunking Strategy

Best Practices:

2. Context Injection

Prompting for RAG:

3. Optimizing for Speed

4. Hybrid Search (Upcoming)