ragApril 14, 202611 min read

Beyond RAG: What Karpathy's LLM Wiki Actually Changes

RAG rediscovers knowledge on every query. Karpathy's LLM Wiki compiles it once and lets it compound — here's what that shift means if you actually build these systems.

The Thing RAG Never Solved

I've built enough RAG pipelines to know where they quietly fail. Not on demos — on the second or third month of production, when users start asking questions that require connecting three documents you ingested six weeks apart.

RAG is stateless. Every query is day one. The LLM retrieves chunks, synthesizes an answer, and throws away everything it just figured out. Ask the same question tomorrow — same retrieval, same re-synthesis, same ephemeral answer.

That's not a retrieval problem. That's a knowledge architecture problem.


What Karpathy Posted

On April 3, 2026, Karpathy posted on X:

"LLM Knowledge Bases — Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge."

The tweet went viral. He followed it the next day with a GitHub Gist — a full idea file laying out the system architecture and philosophy.

The premise, stated simply:

Don't retrieve at query time. Compile at ingestion time.

Instead of dumping raw documents into a vector store, you use an LLM to read every incoming source and write a wiki — structured markdown with concept pages, cross-references, summaries, and backlinks. That wiki is the thing you query against. It persists. It compounds.


What Himanshu's Diagram Actually Added

Karpathy described the idea in prose. Himanshu drew the full system.

His diagram made explicit what the tweet left implicit — specifically three things most commentary missed:

The output layer. Karpathy talked about the wiki itself. Himanshu's diagram showed that the wiki is an intermediate artifact, not the end product. From the wiki, the LLM can generate outputs: structured Markdown reports, slides, and charts via Matplotlib. The wiki becomes a source of derived knowledge products, not just a place to ask questions.

The feedback loop. The diagram showed a "Filed back → C to wiki" arrow connecting Q&A output back into the wiki. When you ask a question and get a good answer, that answer becomes a new wiki page. The knowledge base grows from two directions: source ingestion and query exploration. This is the part that makes it genuinely compound — not just inputs accumulating, but your own questions becoming knowledge.

The future directions. The bottom of the diagram shows two end goals Karpathy mentioned but didn't elaborate on: synthetic data gen (fine-tune a small model on the clean wiki) and product vision (beyond hacky scripts, into a real system). That arc — from manual ingestion → structured wiki → training data → custom model — is the real long-term payoff, and Himanshu made it visible as a roadmap rather than a footnote.


The Architecture

plain
1raw/ ← immutable source files you add (articles, papers, repos, images)
2wiki/ ← LLM-owned markdown (concepts, summaries, backlinks)
3schema.md ← configuration that tells the LLM how to maintain the wiki
4index.md ← table of contents (~100 articles, one-line summaries)
5log.md ← append-only record of every ingest/lint/query event

The LLM runs four operations against this structure:

Compile — new source arrives in raw/, LLM reads it and writes or updates 10–15 wiki pages in one pass. Done once at ingestion time. Not repeated at query time.

Q&A — user asks a question, LLM reads index.md first (to find relevant pages), loads those specific pages, synthesizes an answer with wiki-link citations. If the answer is worth keeping, it gets filed back as a new page.

Lint — periodic health check across the full wiki. Finds contradictions between pages, stale claims, orphaned concepts, missing links. Runs weekly or after significant ingestion.

Index — summaries and backlinks stay current as pages are added or updated.


These three words get thrown around but they're doing specific jobs. Worth understanding concretely.

Backlinks — imagine a wiki page on Transformer Architecture. Inside it, you reference Attention Mechanism and Positional Encoding. Those are forward links. A backlink is the reverse: the Attention Mechanism page automatically knows it was mentioned by Transformer Architecture, BERT, and GPT-2. When you open the Attention page, you see the full web of pages that depend on it. That's the knowledge graph. In RAG, every document is an island — no page knows it exists in relation to anything else.

Index — once your wiki grows past ~100 pages, the LLM can't hold it all in context. The index is a lightweight table of contents: page title, one-line summary, category. When you ask a question, the LLM reads the index first to decide which wiki pages are relevant, then loads only those. Same reason a textbook has a table of contents before 600 pages — you don't re-read everything to find one concept.

Lint — here's a concrete example. In March you ingested a paper saying "GPT-4 context window is 8K tokens." In April you added OpenAI's updated docs saying it's 128K. RAG doesn't care — both chunks coexist, one will win retrieval based on score. A lint pass reads the full wiki, flags that contradiction, and either updates the stale page or surfaces it for your review. It also catches orphaned pages (a concept referenced everywhere but no page actually exists for it) and dead ends (pages nothing links to, probably shouldn't be in the wiki). This is what makes the wiki trustworthy over time rather than just large.

Note

Think of lint as the difference between a maintained Wikipedia article and a Google Doc nobody's touched since 2021.

The Schema File — the Most Important Piece

schema.md (or CLAUDE.md / AGENTS.md depending on your agent) is what makes this system coherent across sessions. It's the persistent configuration that tells the LLM exactly how to behave when it touches the wiki.

A minimal skeleton:

markdown
1# Wiki Schema
2
3## Directory Structure
4- `raw/` — immutable. Never modify source files.
5- `wiki/` — you own this entirely. Create, update, delete pages as needed.
6- `raw/assets/` — local copies of images, referenced in wiki via relative paths.
7
8## Page Format
9Every wiki page must have this frontmatter:
10
11---
12title: <page title>
13type: concept | entity | source-summary | comparison
14sources: [list of raw/ files this page draws from]
15related: [list of wiki pages linked from this page]
16created: YYYY-MM-DD
17updated: YYYY-MM-DD
18confidence: high | medium | low
19---
20
21Body: TLDR (2-3 sentences) → main content → counterarguments or caveats
22
23## Ingest Workflow
24When a new file arrives in raw/:
251. Read it fully before writing anything.
262. Identify 38 key concepts. For each: find existing wiki page or create one.
273. Update cross-references — if you mention concept X, link to wiki/X.md.
284. Add backlinks — update the related: field on every page you reference.
295. Append to log.md: `## [YYYY-MM-DD] ingest | <source title>`
30
31## Query Workflow
32When asked a question:
331. Read index.md first. Identify relevant pages.
342. Load those pages. Synthesize answer with [[wiki-link]] citations.
353. If the answer is worth keeping, ask: "File this as a wiki page?"
36
37## Lint Workflow
38Scan the full wiki and report:
39- Contradictions between pages (same claim, different values)
40- Orphaned pages (no incoming backlinks)
41- Dead references (links to pages that don't exist)
42- Stale dates (confidence: high on pages older than 6 months)

Rule of thumb

Time spent on schema.md pays back on every single ingest. A vague schema produces vague pages. A specific one produces something you'll actually trust.

The Query → Wiki Feedback Loop

This is the part that makes it genuinely compound and it's easy to miss.

Standard flow: you ask a question, get an answer, close the chat. Tomorrow that synthesis is gone.

LLM Wiki flow: you ask a question, get an answer with citations like [[Attention Mechanism]] and [[Scaling Laws]]. The LLM then asks: "This answer synthesizes three sources in a way that isn't currently captured anywhere in the wiki. Should I file it as a new page?" You say yes, it writes wiki/attention-vs-scaling-tradeoffs.md, links it to the related pages, updates the index.

Now that question you asked is permanently part of the knowledge base. The next time someone asks something adjacent, the LLM finds that synthesis directly — it doesn't have to re-derive it.

Your curiosity compounds. That's the actual value, and it doesn't happen in RAG at all.


RAG vs LLM Wiki — the honest comparison

The difference isn't retrieval quality. It's whether knowledge accumulates.

plain
1RAG: ingest → embed → store → [query: retrieve → synthesize → discard]
2LLM Wiki: ingest → compile → store → [query: read wiki → synthesize → file back]

With RAG, three papers on the same topic stay three papers. With LLM Wiki, after ingestion they're one concept page with contradictions flagged and relationships explicit.

"The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping." — Karpathy's Gist

LLMs are infinitely patient bookkeepers.


The Code

python
1import anthropic
2import os
3from pathlib import Path
4
5client = anthropic.Anthropic()
6
7def load_schema(wiki_root: str) -> str:
8 schema_path = Path(wiki_root) / "schema.md"
9 return schema_path.read_text() if schema_path.exists() else ""
10
11def load_wiki_index(wiki_root: str) -> str:
12 index_path = Path(wiki_root) / "wiki" / "index.md"
13 return index_path.read_text() if index_path.exists() else ""
14
15def compile_source(raw_file: str, wiki_root: str) -> None:
16 """Ingest a new raw source into the wiki."""
17 schema = load_schema(wiki_root)
18 source_text = Path(raw_file).read_text()
19 existing_index = load_wiki_index(wiki_root)
20
21 response = client.messages.create(
22 model="claude-opus-4-6",
23 max_tokens=8096,
24 system=f"{schema}\n\nCurrent wiki index:\n{existing_index}",
25 messages=[{
26 "role": "user",
27 "content": f"""New source to compile into the wiki:
28
29{source_text}
30
31Following the schema:
321. Identify key concepts — check the index for existing pages to update vs new ones to create.
332. Write each page in the required format (frontmatter + TLDR + body + caveats).
343. Add backlinks — update related: fields on pages you reference.
354. Append to log.md.
36
37Output format: for each file, write `=== wiki/filename.md ===` then the full content."""
38 }]
39 )
40
41 # Parse and write pages from the response
42 _write_pages_from_response(response.content[0].text, wiki_root)
43
44
45def lint_wiki(wiki_root: str) -> str:
46 """Run a health check across the full wiki."""
47 schema = load_schema(wiki_root)
48 wiki_dir = Path(wiki_root) / "wiki"
49
50 all_pages = {}
51 for md_file in wiki_dir.glob("**/*.md"):
52 all_pages[str(md_file.relative_to(wiki_root))] = md_file.read_text()
53
54 pages_combined = "\n\n---\n\n".join(
55 f"FILE: {path}\n{content}"
56 for path, content in all_pages.items()
57 )
58
59 response = client.messages.create(
60 model="claude-opus-4-6",
61 max_tokens=4096,
62 system=schema,
63 messages=[{
64 "role": "user",
65 "content": f"""Run a lint pass on the full wiki. Find:
66- Contradictions (same claim, conflicting values across pages)
67- Orphaned pages (no incoming backlinks from other pages)
68- Dead references (links to pages that don't exist)
69- Stale high-confidence claims on pages older than 6 months
70
71Wiki contents:
72{pages_combined}
73
74Output a structured report with file:line references for each issue."""
75 }]
76 )
77
78 report = response.content[0].text
79 lint_report_path = Path(wiki_root) / "lint-report.md"
80 lint_report_path.write_text(report)
81 return report
82
83
84def query_wiki(question: str, wiki_root: str, file_back: bool = False) -> str:
85 """Query the wiki. Optionally file the answer back as a new page."""
86 schema = load_schema(wiki_root)
87 index = load_wiki_index(wiki_root)
88
89 # Step 1: identify relevant pages from index
90 routing = client.messages.create(
91 model="claude-opus-4-6",
92 max_tokens=512,
93 messages=[{
94 "role": "user",
95 "content": f"Given this wiki index:\n{index}\n\nQuestion: {question}\n\nList the wiki page filenames most relevant to answering this. Filenames only."
96 }]
97 )
98
99 relevant_files = _parse_filenames(routing.content[0].text)
100
101 # Step 2: load those pages and answer
102 wiki_dir = Path(wiki_root) / "wiki"
103 loaded_pages = "\n\n---\n\n".join(
104 (wiki_dir / f).read_text()
105 for f in relevant_files
106 if (wiki_dir / f).exists()
107 )
108
109 response = client.messages.create(
110 model="claude-opus-4-6",
111 max_tokens=2048,
112 system=f"{schema}\n\nWiki pages:\n{loaded_pages}",
113 messages=[{
114 "role": "user",
115 "content": f"{question}\n\nUse [[wiki-link]] citations. If this answer synthesizes something not currently captured in the wiki, flag it for filing."
116 }]
117 )
118
119 answer = response.content[0].text
120
121 if file_back:
122 # compile_source_from_text — same as compile_source but takes a string
123 # instead of a file path. Writes the answer to raw/ then runs compile.
124 compile_source_from_text(answer, wiki_root, source_type="query-result")
125
126 return answer

Note

_write_pages_from_response and _parse_filenames are simple string parsing helpers — split on the === wiki/filename.md === delimiter and write each chunk to disk. The schema is what makes the output structured enough to parse reliably.

About qmd

When the wiki grows past ~100 pages, reading the full index every query starts to strain context. qmd (github.com/tobi/qmd) is a local CLI search engine built specifically for markdown knowledge bases. It runs BM25 full-text search, vector semantic search, and LLM re-ranking — all on-device via node-llama-cpp with GGUF models. No API calls.

bash
1# Install
2npm install -g qmd
3
4# Index your wiki
5qmd index ./wiki
6
7# Hybrid search (BM25 + vector + rerank)
8qmd query "how does attention scale with sequence length"

It also exposes an MCP server, so your LLM agent can call it directly rather than you writing the routing step manually. At small scale you don't need it. Past 200 pages, you probably do.


Where This Actually Applies

This isn't a replacement for RAG at enterprise scale. Karpathy explicitly scopes it: a bounded, curated corpus — ~100 articles, a research domain you're actively building knowledge in.

Where it fits well:

  • Personal research system (papers, blog posts, notes)
  • Domain-specific internal knowledge base with slow-moving documents
  • Any system where synthesis quality matters more than ingestion volume

Where RAG still wins:

  • Large, fast-changing corpora
  • Real-time document ingestion at scale
  • Multi-tenant systems where per-user wikis aren't practical

What I'm Actually Taking From This

The part that landed hardest isn't the wiki format. It's the lint pass.

In every RAG pipeline I've built, there's dead knowledge — documents contradicting each other, concepts that got redefined as the domain evolved, relationships that exist in the data but never surface at query time. We don't clean it because it's tedious and there's no mechanism for it.

A periodic LLM pass that reads the whole knowledge base and surfaces inconsistencies — that's something worth bolting onto existing RAG pipelines even if you never adopt the full wiki pattern.

The bigger shift is treating the LLM as a knowledge author, not just a retriever. RAG asks the LLM to be a fast reader. LLM Wiki asks it to be a librarian. Those are different jobs, and the second one is actually closer to what LLMs are good at.


Where It Goes Next

Karpathy's gist ends with two future directions shown explicitly in Himanshu's diagram:

Synthetic data gen — once the wiki is clean and structured, fine-tune a smaller model on it. The curated knowledge base becomes training data. The wiki isn't just a retrieval artifact — it's a path to a private, domain-specific model without the overhead of a full fine-tuning pipeline on raw documents.

Beyond hacky scripts — proper tooling around the ingest/compile/lint loop. The open-source ecosystem is already moving fast here: MCP server implementations, Claude Code agent skills, and CLI tools appeared within days of the gist. The pattern is settling; the tooling is catching up.

The idea that a knowledge base should be continuously maintained by the same LLM that uses it — that feels like the direction everything is heading. RAG was the right answer when context windows were small and LLMs were slow. Both of those constraints are eroding. The architecture should follow.

Share