Parse providers
Choose how Pegasus reads your documents during training.
Parsing is the first phase of training: Pegasus extracts text and structure from your raw documents and converts them to Markdown. The bot's vector index is built from that Markdown. Different parsers handle the same document differently — choose the one that fits your content.
Pegasus offers two parsers#
| Provider | Best for | Trade-off |
|---|---|---|
| Marker (Slower & Secure) | Sensitive content, complex PDFs with tables/figures, scanned documents | Slower — extra OCR + layout work |
| LlamaParse (Fast & Stable) | Clean text documents, simple structure, fast turnaround | Less reliable on scanned PDFs and very visual layouts |
The choice lives in the Source settings tab as a "Parse Provider" selector.
When to pick Marker#
- The document is a scanned PDF (image-based) and you need accurate text.
- The document has dense tables, formulas, or figures that carry meaning.
- The content is sensitive and you prefer keeping parsing on the local pipeline.
Expect training to take noticeably longer.
When to pick LlamaParse#
- The document is text-native (exported from Word, generated from a template, or already Markdown).
- You want training to finish as quickly as possible.
- The visual layout is simple — paragraphs and headings.
Changing the provider mid-life#
You can switch the provider before each training. The choice applies to the next training only; previous parsed outputs are not retroactively re-parsed.
What happens if a parse fails#
If the parser fails on a file, training reports an error and the bot returns to its previous status. The bot's "Last error" field shows a brief reason. Try the other parser, or simplify/split the offending document.