Documentation Patterns
Documentation Patterns measures whether your docs site uses the structural conventions AI agents
recognize as documentation. The category covers schema markup that names what each page is,
code-block markup that survives extraction, and the agent-metadata trio (llms.txt,
agents.md, .well-known/mcp.json) that disclose the site's contents at the
root level.
What this category measures
Documentation Patterns is a docs-vertical category in the AI Readiness rubric. It measures three classes of structural signals: schema markup that names what each page is, code-block markup that survives extraction, and the agent-metadata files that disclose what a site contains at the root level. The category sits at two steps of the AI-agent loop — Step 2 (Parse) for the page-level signals, Step 3 (Structure) for the site-level metadata. For the loop in detail, see /docs/how-ai-agents-read-your-docs.
The schema-markup class includes TechArticle, APIReference, and
HowTo. These are content-type identifiers — the JSON-LD blocks that tell an AI agent
the page is a technical article, an API reference, or a step-by-step procedure. Without them, the
agent has prose under an H1 and no typed handle on what kind of content it is parsing. With them,
the agent can confidently retrieve the page when a user's question maps to the content type.
The code-block class is narrower but high-leverage. Real <pre><code class="language-X">
markup survives the agent's parse with the this is code boundary intact. JS-decorated
<div> syntax-highlighting that strips the <code> semantic does
not — the agent loses the structural cue that the block is meant to be copied verbatim into a
developer's editor.
The agent-metadata trio includes llms.txt, agents.md, and
.well-known/mcp.json. The rubric scores at two tiers: presence (does the file exist?)
and content quality (is the file populated correctly? — added in v4.1 per Phase 4.3). The hard cap
agent_metadata_trio_absent (75) fires when all three trio files are absent — a
structural signal that no one has thought about agents at all.
The canonical weights, per-check details, and hard-cap arithmetic live at /methodology. D-1 introduces the broader category list at the conceptual register: see /docs/ai-readiness for the 8-category overview.
Why it matters
AI Readiness measures how well AI systems can understand, retrieve, cite, and act on your content. Documentation Patterns is the category most directly responsible for the understand and act on halves of that measurement. Schema markup is what lets the agent know it is reading a TechArticle and not a marketing page. Code-block markup is what lets the agent extract code that runs in the developer's environment, not text-shaped fragments that paste broken.
The consumers are the AI agents reading documentation to answer developer questions — ChatGPT, Claude, Perplexity, Gemini — and increasingly the AI agents that write integration code on developer behalf. The first audience cites; the second composes. Both depend on the same structural signals.
A low score in Documentation Patterns degrades both consumers' ability to use the docs site
cleanly. Without TechArticle schema, the page lacks the typed handle that lets agents
recognize it as a canonical technical reference rather than generic web content. Without
<pre><code> markup, the page lacks the structural boundary agents use to
extract code as code; copied fragments paste as prose. Without llms.txt, the site
lacks a top-level summary, and agents fall back to inferring shape from crawled pages — slower
and less complete than reading a curated index.
Documentation Patterns is the docs-vertical differentiator in the rubric. It is the category that separates the Docs Readiness Audit from a generic AEO scan. Every other category — AI Crawlability & Access, Structured Data, AEO Readiness, and the rest — applies to any site type. Documentation Patterns asks whether the site is shaped like docs. For the deep comparison of where AEO and SEO diverge on this dimension, see /docs/aeo-vs-seo.
Every check in this category is a structural test, not a judgment call. The rubric defines a fact pattern (schema present and valid? code blocks marked up correctly? metadata files present?), tests for it, and scores. The same content scored against the same rubric returns the same Score, by design — the determinism property that makes the audit actionable. The canonical definition of that property lives at /methodology.
What a high-scoring page looks like
Three examples that demonstrate the category in practice. Each is copy-paste-runnable; together they cover the three signal classes named above.
Example 1: TechArticle JSON-LD. A documentation page about Stripe
webhooks shipping with a <script type="application/ld+json"> block at the top of
the document body:
{
"@context": "https://schema.org",
"@type": "TechArticle",
"headline": "How to verify a Stripe webhook payload",
"description": "Verify webhook signatures using the Stripe-Signature header and the endpoint signing secret.",
"author": {
"@type": "Organization",
"name": "Stripe Documentation Team"
},
"datePublished": "2026-04-15",
"dateModified": "2026-04-25",
"proficiencyLevel": "Intermediate",
"dependencies": "Stripe SDK 11+, Node 18+"
}
The agent reads this block before reading the prose. It now knows the page is a technical article
(not a marketing page), names the author canonically, has explicit dates, and identifies
prerequisites. The proficiencyLevel and dependencies fields help the
agent decide whether to surface the page when a user's question matches the level and stack.
Example 2: real <pre><code> markup. Compare two
implementations of the same code block.
Bad — JS-decorated <div> with no semantic markup:
<div class="code-highlight">
<span class="token-keyword">const</span> sig = req.headers['stripe-signature'];
</div> Good — semantic <pre><code> with a language attribute:
<pre><code class="language-javascript">
const sig = req.headers['stripe-signature'];
</code></pre>
The bad version renders identically in the browser. The good version survives the agent's parse.
AI agents extract code by looking for <pre><code> boundaries; the bad
version has neither, so the agent treats the content as prose. A developer who copies from the good
version gets runnable code; a developer who copies from the bad version gets text fragments stripped
of context.
Example 3: a minimal llms.txt. The file lives at the site root and
follows the llmstxt.org spec — H1 site title, H2 sections, links
with one-line summaries:
# Stripe Documentation
Reference for the Stripe API and integrations.
## Docs
- [Webhooks](/docs/webhooks): Receive events from Stripe in real time
- [Authentication](/docs/keys): API keys and authentication patterns
- [Payments](/docs/payments): Charge cards, manage subscriptions, handle refunds
## Methodology
- [API stability policy](/api-versioning): Deprecation timelines and migration support
Three sections, six links. Each link carries a one-line summary the agent reads before deciding
whether to crawl the linked page. For sites with hundreds of docs pages, llms.txt is
the index that turns a slow inferential crawl into a fast targeted retrieval.
Common failure modes
Five patterns drag a site's score down in this category. Each is structural, recoverable, and named in the lightning-scan output where it fires.
-
TechArticleschema entirely absent. The most common pattern in Lightning Scan results. The lightning-scan check returns "No TechArticle / APIReference / HowTo schema — these help AI engines retrieve docs as docs." The typed handle is missing, so agents have no schema-level signal that the page is a technical reference rather than marketing copy. Recovery: add aTechArticleJSON-LD block per Example 1 above. - Code blocks rendered as JS-decorated
<div>elements. Some legacy docs platforms ship code blocks as colored<span>elements inside<div>containers, with no<pre><code>semantics. The lightning-scan check returns "No code blocks found in raw HTML — docs sites should use<pre><code>markup so AI can extract examples." The page renders correctly in a browser; the agent extracts text fragments stripped of code semantics. Recovery: render code blocks server-side or use a syntax highlighter that emits<pre><code class="language-X">semantics in the raw HTML. -
llms.txtabsent at root. No top-level summary of what the site contains. Agents fall back to inferring site shape from crawled pages — slower and less complete than reading a curated index. Recovery: ship a minimalllms.txtper Example 3 — a minimal file ships in dozens of lines; sites with hundreds of docs pages benefit most. -
agents.mdabsent. No interaction contract for how the agent should crawl the site — preferred rate, allowed surfaces, available tools. The agent uses its defaults. The penalty is smaller thanllms.txt's because theagents.mdspecification is in flux as of 2026 — multiple proposals are circulating across Markdown, YAML, and JSON variants. Ship a structurally-valid file in whichever format you prefer; agents that read the file today are forgiving on format and strict on the absence of one. -
.well-known/mcp.jsonabsent. No Model Context Protocol discovery file. The smallest penalty of the three trio files today, but the signal grows as MCP adoption accelerates. Recovery: ship a minimal MCP discovery file describing the site's queryable resources.
When all three trio files are absent, the rubric's hard cap
agent_metadata_trio_absent fires and caps the entire site Score at 75 regardless of
how strong the rest of the structural signals are. The fix is the smallest absolute amount of work
in the rubric and the largest single score impact — three files at the site root, each measured in
dozens of lines.
What you can do today
Three concrete steps:
- Run a free Lightning Scan of your site to see your AI Readiness Score in the Documentation Patterns category specifically. The per-category breakdown shows whether the score gap is in schema markup, code-block markup, or the agent-metadata trio — the three signal classes named throughout this article.
- Read /methodology for the canonical weights, hard
caps, and per-check details. The Documentation Patterns category weight and the
agent_metadata_trio_absent (75)hard-cap arithmetic live there. - Read the per-topic articles in /docs as they ship. Each signal class in this category — schema markup, code-block patterns, agent-metadata trio — earns its own deep article.
The articles in /docs exist to make AI Readiness an addressable property, one decision at a time, with examples that copy-paste cleanly.