Max Mode (MASS-RAG)
Deeper, more accurate answers via multi-agent synthesis.
Max Mode is Pegasus's advanced retrieval-and-reasoning pipeline. Instead of one LLM call against retrieved chunks, Max Mode runs a cross-encoder reranker plus three parallel filter agents and one synthesis agent. The result is more accurate answers on complex, multi-section, or contradictory documents — at the cost of more tokens and higher latency.
Max Mode requires the Pro or Ultra plan. Free plan users see "Requires Pro or Ultra plan to use Max Mode."
When to enable Max Mode#
Use Max Mode when:
- Questions require synthesis across multiple sections of a document.
- The knowledge base contains contradictory or conflicting information.
- The bot must perform multi-hop inference (A → B → C → conclusion).
Stick with Standard RAG when:
- Questions are simple and factual (answered in 1–2 paragraphs).
- Latency and cost are top priorities.
- Documents are short and low-noise.
How Max Mode works#
Vector retrieve (top 30)
↓
Cross-encoder reranker → top K
↓
3 filter agents in parallel:
- Summarizer (high-level overview)
- Extractor (verbatim quotes — anti-hallucination guardrail)
- Reasoner (cross-section inference)
↓
Synthesis agent → final answerCross-encoder reranker#
Vector search is fast but ranks chunks by embedding similarity — it can promote a table of contents over the dense paragraph that actually answers the question. The reranker reads the query and each candidate chunk together with a cross-encoder model, producing a sharper relevance score. The top K chunks are forwarded; the rest are dropped before the LLM ever sees them.
Benefit: dramatically higher precision in the context window.
Trade-off: adds one extra service call per request. If the rerank service is unavailable, the system falls back to vector ordering — chat keeps working.
MASS-RAG multi-agent#
Three filter agents process the same context independently:
- Summarizer — produces a high-level overview of the retrieved context.
- Extractor — pulls verbatim quotes to anchor the answer in source material (anti-hallucination guardrail).
- Reasoner — performs cross-section inference and identifies relationships.
The synthesis agent never sees raw chunks — only the distilled outputs from the three filter agents. It combines them into the final answer.
Benefit: less "lost in the middle" — noise is filtered before synthesis. Lower hallucination risk via verbatim extraction. Multi-hop reasoning across document sections. Better handling of contradictions.
Trade-off: ~5–7× more tokens per request (4 LLM calls instead of 1). Latency is 2–3× Standard RAG. Not ideal for simple, single-paragraph questions.
Enabling Max Mode on a bot#
Open Source settings on the bot
The "Max Mode (MASS-RAG)" section is below the system prompt.Toggle Max Mode on
Switch the toggle to enabled. Toast confirms: "Max Mode enabled."Use it in chat
In the Chat Workspace input bar, you'll see a Max Mode toggle. Enable it per-conversation to use Max Mode for that conversation. See Max Mode in chat.
Cost awareness#
Because Max Mode costs ~5–7× more tokens per chat message, it drains your daily credit limit much faster. Plan accordingly: use it for the questions that benefit, not as a default.