Structured Data

Structured Data measures schema.org markup beyond docs-specific types — JSON-LD presence, required-field coverage, schema-type breadth, and FAQPage. Absent this markup layer, AI agents extract direct answers from unstructured prose, which produces inconsistent citations. FAQPage is the lowest-friction win; Organization schema is how the company name enters an agent's answers.

What Structured Data measures

One of eight categories in Obaron's AI Readiness rubric (introduced in /docs/ai-readiness), Structured Data covers the schema.org vocabulary beyond docs-specific types. TechArticle, APIReference, and HowTo schema live in the Documentation Patterns category because they name what kind of documentation a page is. Structured Data picks up the rest — Article, FAQPage, Organization, WebSite, and BreadcrumbList — which name what kind of site a page belongs to and what questions the page answers.

The rubric tests four structural properties:

  • JSON-LD presence. Is a <script type="application/ld+json"> block in the raw HTML response? Five points. Without it, the three checks below all mark as dependent — nothing to evaluate.
  • Required fields. Does each schema type carry its minimum required fields? Three points. A FAQPage block with an empty mainEntity array and an Organization block with no name field are common failures that pass the presence check but fail here.
  • Schema-type breadth. How many distinct schema types are present? Three or more earns full credit; one or two earns partial. Three points.
  • FAQPage. Is FAQPage one of the schema types? Two points. FAQPage is scored separately from breadth because of its direct relationship to AI-agent direct-answer extraction.

The category maps to Step 2 (Parse) in the four-step agent loop. When an agent parses the raw HTML response, it reads the JSON-LD blocks as the first signal layer — identifying content type, publisher identity, and answer candidates before touching the body text. See /methodology for the canonical category weights and hard caps.

Why it matters

FAQPage is the schema type that maps most directly to how AI agents generate direct answers. A FAQPage block is a structured list of questions with authoritative answers embedded in the markup. When an agent encounters it, extraction is deterministic: the name field of each Question is the query candidate; the text field of the Answer is the citation body. Without FAQPage, the agent has to infer a question-and-answer shape from prose headings and surrounding paragraphs — workable, but more error-prone and less precise. FAQPage removes the inference step entirely.

The cross-cut between Structured Data and AEO Readiness: AEO Readiness measures whether the body of a docs page is shaped for direct-answer extraction — question-style headings, concise answers, immediate answer placement. Structured Data's FAQPage check measures whether the page also marks up its questions and answers with schema. Both categories reward the same underlying content shape, from different angles.

Organization schema surfaces the company identity in agent answers. An Organization block with name, url, logo, and sameAs gives agents the authoritative cross-reference map: this page belongs to this company, which is also reachable at these external identifiers. Without it, agents infer company identity from page titles, anchor text, and domain parsing — accurate most of the time, but fragile after a rebrand, acquisition, or subdomain restructure.

Every check in this category tests a structural property, not a quality judgment. JSON-LD either exists in the raw HTML response or it doesn't. mainEntity is either populated or it isn't. The rubric is checking parse-correctness against the schema.org specification — see the determinism property at /methodology for how this shapes reproducible scoring.

What a high-scoring page looks like

A page that passes all four Structured Data checks carries a JSON-LD block with at least three schema types, each with its required fields. The example below uses an @graph array — the recommended pattern when a page carries multiple schema types in a single script element:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "FAQPage",
      "mainEntity": [
        {
          "@type": "Question",
          "name": "How do I authenticate with the API?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "Pass your API key in the Authorization header as a Bearer token. Keys are scoped per environment — use the sandbox key during development and the production key before deploying."
          }
        },
        {
          "@type": "Question",
          "name": "What formats does the endpoint accept?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "The endpoint accepts JSON and multipart/form-data. Set the Content-Type header accordingly. XML is not supported."
          }
        },
        {
          "@type": "Question",
          "name": "How are rate limits enforced?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "Requests are rate-limited to 100 per minute per API key. Responses include X-RateLimit-Remaining and Retry-After headers. Exceeded requests receive 429 Too Many Requests."
          }
        }
      ]
    },
    {
      "@type": "Organization",
      "@id": "https://your-product.com/#organization",
      "name": "Your Product",
      "url": "https://your-product.com",
      "logo": {
        "@type": "ImageObject",
        "url": "https://your-product.com/logo.svg"
      },
      "sameAs": [
        "https://github.com/your-product",
        "https://x.com/yourproduct"
      ]
    },
    {
      "@type": "WebSite",
      "@id": "https://your-product.com/#website",
      "url": "https://your-product.com",
      "name": "Your Product Docs"
    }
  ]
}
</script>

What earns the score against each check:

  • schema.json_ld passes because the <script type="application/ld+json"> block is in the raw HTML — not injected after JavaScript executes.
  • schema.fields_valid passes because FAQPage has mainEntity populated with at least one Question, Organization has name set, and WebSite has url set.
  • schema.type_breadth passes at full credit because three distinct @type values are present: FAQPage, Organization, WebSite.
  • schema.faq_present passes because FAQPage is one of the types in the graph.

The sameAs array on Organization is not required for a rubric pass, but it is the signal that connects the company to external identity sources. Agents building knowledge entries for a product read sameAs as the authoritative cross-reference list. The @id fields stabilize the graph across pages — when the same Organization block appears on ten pages, the identical @id URI tells agents these are one entity, not ten independent records.

This block is sitewide. Organization and WebSite belong in the layout template, not per-page. FAQPage belongs on pages that already have question-style headings and direct answers — the content is usually already there; the schema is what's missing.

Common failure modes

The four patterns the rubric flags most often, in descending order of frequency:

  • No JSON-LD at all. The scan returns: "No JSON-LD structured data found — AI engines use schema markup to understand what your site is about." The entire Structured Data category scores 0, and the fields and breadth checks mark as dependent. This is the most common Structured Data failure — not because it's difficult to implement, but because it was never added. The recovery is one <script type="application/ld+json"> block in the layout template.
  • Required fields missing. The scan returns: "Schema missing required fields: {field list}." Typical instances: Organization without a name field; FAQPage with an empty mainEntity array; Question with no acceptedAnswer. The JSON-LD presence check passes — the block exists — but the fields check fails. Validate against https://validator.schema.org/ before marking the implementation complete.
  • Too few schema types. The scan returns: "Only 1 schema type found — adding Organization, WebSite, or BreadcrumbList helps AI identify your site." Docs sites that ship only a generic Article or TechArticle type on every page satisfy JSON-LD presence but score partial on breadth. The fix is additive: Organization and WebSite are sitewide types that belong in the layout template, not per-page.
  • FAQPage absent. The scan returns: "No FAQPage schema — AI engines lean on FAQPage to extract direct answers." FAQPage is the check most directly tied to AI-agent citation quality. Its absence doesn't block citation entirely, but it removes the most reliable extraction path for direct answers. Pages that already have question-style headings and answers need only the schema wrapper — the content is already there.

One additional failure the rubric's field-validation catches but surfaces less visibly: @type values that don't match canonical schema.org casing — Faqpage instead of FAQPage, organisation instead of Organization. The JSON-LD parses without syntax errors, but the type is unrecognized. Validate at https://validator.schema.org/ — mismatched casing fails there before it fails in production.

All five patterns are recoverable in a single markup pass.

What you can do today

Three steps:

  1. Run a free Lightning Scan of your site. The per-category breakdown surfaces your Structured Data score with the individual check results — JSON-LD presence, required fields, schema-type breadth, FAQPage — so you know exactly which of the four properties your pages pass or fail.
  2. Read the related deep-topic articles in /docs. When they ship, /docs/faqpage-schema covers FAQPage schema in detail — question structure, answer length, and how AI agents extract from it. /docs/breadcrumb-navigation covers BreadcrumbList, which contributes to schema-type breadth here and earns its own deep treatment in the Site Architecture / Navigation category.
  3. Read /methodology. It is the canonical reference for category weights, hard caps, and the determinism property that governs how the rubric produces the same Score for the same pages on every run.

Last reviewed against AI Readiness rubric v4.0.