Ingestion spec — compiler agent contract
The operational manual for the LLM agent that compiles raw book markdown into wiki pages. Follow this when running an ingest, query, or lint pass.
Prerequisites before running: DESIGN.md, wiki/schema.md.
Operation 1 — INGEST
One book at a time. Never batch-ingest.
Steps
- Select a book not yet ingested. Read
log.mdto confirm. The queue report frompython scripts/lint_wiki.py(check 9) ranks pending books by inbound forward-ref count; normally ingest the top entry. - Read the raw source
raw/TR-XX-<slug>.mdend to end (the only place in the workflow where you use theTR-XXplumbing ID — to locate the file). If the book is long, read in passes. - Report structure to the human: chapter list, main claims, principal entities (concepts, symbols, figures, traditions) introduced or developed, any cross-references to already-ingested or pending books (refer to them by Arabic title, not
TR-XX). Wait for human to confirm or correct. - Create the book page at
wiki/books/<slug>.md— pure slug filename, noTR-XX-prefix. Use the slug frommanifest.tsvcolumn 2.- Frontmatter per
schema.md§ book:title,title_fr,type: book,aliases,author,translator,source_file: raw/TR-XX-<slug>.md(the only line on the page whereTR-XXmay appear),chapters,updated. Do not setbooks:on a book page. - Body: brief bio-bibliographic note, book’s argument in one paragraph, chapter-by-chapter summary (2-4 sentences each), thematic map, list of entities introduced,
## نصوص الشيخ عبد الباقي مفتاحfor verbatim Meftah excerpts when present,## قراءة الموسوعة لتعليقات الشيخ عبد الباقي مفتاحfor editor synthesis of Meftah’s distinctive contributions (kept segregated from Guénon’s claims),## شواهد من الكتبwith 2-4 verbatim quotations.
- Frontmatter per
- For each entity that the book introduces or substantially develops:
- If the page doesn’t exist → create it at
wiki/<dir>/<slug>.mdper schema. - If it exists → update body: add this book’s treatment, append a wiki-link string to the
books:frontmatter list —books: ["[[books/<existing-slug>|<existing title>]]", "[[books/<new-slug>|<new title>]]"]. Never add a bareTR-XXtoken tobooks:. Add new aliases/synonyms toaliases:. Add a section### عند غينون في [[books/<new-slug>|<Arabic title>]]quoting or summarising this book’s angle.
- If the page doesn’t exist → create it at
- Wire backlinks. Every
[[link]]must resolve. Every book’s entity-list must match what the entity pages claim. - Update
index.md. Add lines for newly created pages under the right category heading. - Run the linter:
python scripts/lint_wiki.py. All 8 checks must returnOKbefore declaring the ingest done. The only expected non-zero line is “intentional forward-refs to future books” — that’s the running count of book-pages not yet ingested. If there are no forward-refs, the queue may still report manifest books not yet ingested but without inbound queue pressure. If any check returns FAIL, fix before logging. - Run the quote provenance checker:
python scripts/check_quote_provenance.py. Treat failures as manual-review blockers: either fix the quote againstraw/, move non-source prose out of## نصوص الشيخ عبد الباقي مفتاح, or record why the quote is intentionally outside the raw corpus. - Append
log.mdentry (format below). - Report touched pages to the human. A typical book ingest touches 15-60 pages.
Quality bar per book ingest
- 0 broken
[[links]]. - 0 invented citations (every quotation verbatim from
raw/). - 0 blended Meftah/editor voice in source-text sections. Use
## نصوص الشيخ عبد الباقي مفتاحonly for verbatim material; use## قراءة الموسوعة لتعليق الشيخ عبد الباقي مفتاحfor synthesis. - Arabic register consistent with Meftah’s prose.
- No entity left with only
type:set but no body.
Do NOT during ingest
- Paraphrase quotations into pseudo-citations.
- Put editor synthesis under a heading that implies it is Meftah’s own wording.
- Introduce new
type:categories without first updatingschema.md. - Edit
raw/*.mdto “fix” OCR or translation quirks — note them in the entity page instead. - Run consecutive ingests without at least a quick browse by the human.
- Write
TR-XXanywhere a reader sees it. Book pages use pure slug filenames (haymanat-al-kamm-wa-alamat-akhir-al-zaman.md, no prefix). Inline references use the book’s Arabic short title wiki-linked:[[books/<slug>|هيمنة الكمّ]]. Citations say(هيمنة الكمّ، الفصل X), never(TR-01, ...). Thebooks:property stores wiki-link strings, notTR-XXlabels.TR-XXappears only inraw/andmanifest.tsv, plus thesource_file:line of a book page’s frontmatter — nowhere else. Seeschema.md§ “The TR-XX zone rule”.
Operation 2 — QUERY
Steps
- Read
index.md. Identify candidate pages from category and title. - Open the candidate pages. For deep questions, follow
[[links]]1-2 hops. - Answer the human in chat, with every claim cited to a wiki page (which in turn cites
raw/TR-XX). - File the answer if it’s worth preserving: create
wiki/queries/YYYY-MM-DD-<slug>.md, link from relevant entity pages’## ارتباطاتsection. - Append
log.mdentry.
Filing criterion
File if the question needed multi-hop reasoning, surfaced a new cross-reference, or is likely to be asked again. Don’t file trivial lookups.
Operation 3 — LINT
Run after every 3-5 ingests, or on demand.
Automated checks — run first
python scripts/lint_wiki.py performs eight mechanical checks:
- TR-XX leakage in reader-facing zones.
- Broken
[[wiki-links]](excluding intentional forward-refs to future books). - Self-links (page linking to itself).
- Frontmatter sanity (
title,type,updatedpresent on every page). books:property format (no bareTR-XX).- Orphan pages (entity pages with no inbound links).
- Index drift (
index.md↔ filesystem). - Backlink symmetry (entity cites book ⇒ book page links entity).
All eight must return OK. Fix any FAIL; investigate every WARN.
Then run python scripts/check_quote_provenance.py. This is a mechanical provenance assistant for block quotes and Meftah source sections. It is stricter than the wiki linter and may require human judgment, but new unmatched quotes should be fixed or explicitly explained before logging completion.
Manual checks — after the automated pass
- Citation integrity: run
python scripts/check_quote_provenance.py, then sample 10 citations manually and verify verbatim againstraw/. - Contradictions: scan for pages claiming opposing things about the same entity. Flag in a
## ملاحظات الفحصsection on the affected page; don’t auto-resolve. - Aliases coverage: for common concepts, all Arabic synonyms listed in
aliases. - Stale summaries: book page’s entity list matches entities that actually link back (partially covered by check 8, but review the prose too).
- Tashkīl/tatweel normalisation: verbatim quotes keep source styling; titles and aliases should carry plain forms as alternates so Obsidian search works.
Record the lint result in log.md with counts per issue type. Fix or flag each finding.
Log format
## [YYYY-MM-DD] <ingest|query|lint> — <short subject>
- operation: <ingest|query|lint>
- book: TR-XX (ingest only)
- pages_created: <n>
- pages_updated: <n>
- notable: <one-line takeaway>Append at the bottom of log.md. Never rewrite history.
What to discuss with the human
Always raise:
- Structural choices a book forces on the taxonomy (new
type:needed? existing type needs splitting?). - Contradictions between this book and an earlier ingested book.
- Translations of Guénon’s French terms where Meftah’s choice is ambiguous.
Don’t ask permission for:
- Routine page creation and linking.
- Fixing obvious broken links.
- Adding an alias to a page.