Understanding Your Score

Extractability

How well AI systems can parse and extract structured information from your pages.

Overview

What Extractability Measures

Extractability evaluates whether your content is structured in ways that AI systems can efficiently parse. It looks at how well your pages use structured data markup, semantic HTML elements, image descriptions, content organization, and metadata.

Think of it as: "Can AI systems find the important information on your page without guessing?" Pages with strong extractability make it easy for AI to identify what your content is about, pull out key facts, and understand the relationships between different pieces of information.

Dimensions

Key Dimensions

Extractability evaluates six dimensions of how well your content is structured for machine parsing.

  • data_object Structured Data (JSON-LD / Schema.org) — Does your site use structured markup like Product, FAQ, Article schemas? JSON-LD helps AI systems understand the type and structure of your content without having to infer it from the page layout.
  • code Semantic HTML — Do your pages use meaningful tags like article, section, main? Semantic HTML gives AI systems clear signals about content hierarchy and purpose.
  • image Image Alt Text — Do images have descriptive alt attributes? Alt text helps AI understand visual content and include image-related information in its responses.
  • format_list_bulleted Content Structure — Is content organized with lists, tables, and clear headings? Well-structured content is easier for AI to parse into discrete, citable facts.
  • share Open Graph & Meta — Are sharing metadata and canonical URLs properly set? OG tags and canonical URLs help AI systems identify the authoritative version of your content and understand how to present it.
  • lock_open Access Barriers — Are there login walls preventing content access? Login walls are a dealbreaker that severely cap your score. Cookie consent walls have only a minor impact, as most AI agents can bypass them. Noindex tags on pages also reduce your score in this dimension.
Troubleshooting

Common Issues

These are the most frequent reasons for a low Extractability score.

  • warning Missing JSON-LD structured data — This is the most common gap. Without structured data, AI systems must guess the type and structure of your content from the raw HTML, which is less reliable.
  • broken_image Images without alt text — Missing alt attributes hurt both AI parsing and accessibility. AI systems cannot understand what an image depicts without a text description.
  • code_off Content in div-soup without semantic HTML — Pages built entirely with generic div tags give AI no structural clues. Semantic tags like article and nav are much more informative.
  • link_off Missing or incomplete Open Graph meta tags — Without og:title, og:description, and og:image, AI systems and social platforms cannot reliably preview or summarize your pages.
Action Plan

How to Improve

Follow these steps to improve your Extractability score, ordered by typical impact.

  1. Add JSON-LD structured data to key pages
    Implement Product, FAQ, Article, or HowTo schemas depending on your content type. JSON-LD is the preferred format because it is easy to add without changing your HTML structure.
  2. Use semantic HTML tags
    Replace generic div containers with meaningful tags: article, section, main, figure. These give AI systems clear signals about content hierarchy.
  3. Add descriptive alt text to all important images
    Write alt text that describes what the image shows and why it matters in context. Avoid generic phrases like "image" or "screenshot" — be specific.
  4. Organize content with lists and tables
    Where appropriate, present information in HTML lists and tables rather than prose paragraphs. Structured formats are easier for AI to parse into discrete, quotable facts.
  5. Set OG tags and canonical URLs on every page
    Add og:title, og:description, og:image, and a canonical URL to every page. These metadata signals help AI systems identify and correctly attribute your content.
FAQ

Frequently Asked Questions

  • Why is my Extractability score low when my content is good? expand_more
    Extractability measures how well your content is structured for machine parsing, not content quality itself. You may have excellent writing but lack structured data markup, semantic HTML tags, or image alt text. These technical elements help AI systems find and parse your content efficiently. Content Quality is a separate dimension that evaluates the substance of your writing.
  • Which Schema.org types matter most? expand_more
    The most impactful Schema.org types depend on your site. For SaaS products, use Product and FAQPage. For content sites, use Article and HowTo. For services, use LocalBusiness or Organization. Start with the types that best describe your primary pages and expand from there. Your AURA audit report will highlight which schemas are missing from your specific pages.
  • Does Extractability affect how AI chatbots see my site? expand_more
    Yes. AI chatbots like ChatGPT, Perplexity, and Claude rely on structured signals to understand and extract information from web pages. Sites with strong Extractability scores are easier for these systems to parse, which increases the likelihood of your content being accurately cited in AI-generated responses.

Check your Extractability score

Run a Pro or Agency audit to see how well AI systems can parse your content.

Run Your Audit