Parse providers

Parsing is the first phase of training: Pegasus extracts text and structure from your raw documents and converts them to Markdown. The bot's vector index is built from that Markdown. Different parsers handle the same document differently — choose the one that fits your content.

Pegasus offers two parsers#

Provider	Best for	Trade-off
Marker (Slower & Secure)	Sensitive content, complex PDFs with tables/figures, scanned documents	Slower — extra OCR + layout work
LlamaParse (Fast & Stable)	Clean text documents, simple structure, fast turnaround	Less reliable on scanned PDFs and very visual layouts

The choice lives in the Source settings tab as a "Parse Provider" selector.

When to pick Marker#

The document is a scanned PDF (image-based) and you need accurate text.
The document has dense tables, formulas, or figures that carry meaning.
The content is sensitive and you prefer keeping parsing on the local pipeline.

Expect training to take noticeably longer.

When to pick LlamaParse#

The document is text-native (exported from Word, generated from a template, or already Markdown).
You want training to finish as quickly as possible.
The visual layout is simple — paragraphs and headings.

Changing the provider mid-life#

You can switch the provider before each training. The choice applies to the next training only; previous parsed outputs are not retroactively re-parsed.

What happens if a parse fails#

If the parser fails on a file, training reports an error and the bot returns to its previous status. The bot's "Last error" field shows a brief reason. Try the other parser, or simplify/split the offending document.