Technical Guide

    The Technical Checklist: Is Your Website AI-Ready?

    RW
    Ross Williams10 min readTuesday, 31st March 2026

    20+ technical elements determine whether AI crawlers can extract information from your site. Use this checklist to audit your website and fix gaps...

    20+ technical elements determine whether AI crawlers can extract information from your site. Use this checklist to audit your website and fix gaps...

    Summary: Most websites were built and optimised for Google's algorithm, not for large language models. AI crawlers have different requirements than Googlebot. They need clearer structure, better semantic markup, and more explicit metadata. This checklist covers 20+ technical elements that determine whether AI systems can extract and recommend your content.

    Why Technical Setup Matters for AI

    Key Insight

    AI crawlers (GPTBot, ClaudeBot, PerplexityBot, and others) face a different challenge than Google. Googlebot needs to understand your site well enough to rank it.

    AI crawlers (GPTBot, ClaudeBot, PerplexityBot, and others) face a different challenge than Google. Googlebot needs to understand your site well enough to rank it. AI crawlers need to understand your site well enough to extract useful information and include it in generative responses.

    This distinction matters because:

    1. AI Crawlers Need Clearer Hierarchy

    Google's algorithm can infer topical relevance from various signals. AI crawlers need your content hierarchy to be explicit. If an article about "demand generation" has 7 H2 sections of equal weight, the LLM struggles to determine what's most important. If one section is clearly marked as the core definition and others are clearly marked as subcategories, the extraction is cleaner.

    2. AI Systems Value Semantic Markup

    Google ranking doesn't depend heavily on semantic HTML (using <article>, <nav>, <section> tags). AI crawlers depend on it. Properly semantic HTML helps AI systems understand which content is primary, which is navigation, and which is sidebar information.

    3. AI Crawlers Need Explicit Metadata

    Google can infer article dates, author information, and content type from various signals. AI crawlers rely more heavily on explicit metadata in <head> tags. A missing publication date in your markup makes it harder for LLMs to assess freshness.

    4. AI Systems Extract and Synthesise Differently

    Google's algorithm creates a ranking score. AI systems extract information. This means your content needs to be extractable as distinct, quotable, attributable pieces. Long paragraphs of prose are harder to extract than structured lists or clearly delimited sections.

    Crawlability and Indexation

    Key Insight

    Before any of your content can be included in AI responses, AI crawlers need to be able to find and read your pages.

    Crawlability and Indexation — The Technical Checklist: Is Your Website AI-Ready?
    Crawlability and Indexation

    Before any of your content can be included in AI responses, AI crawlers need to be able to find and read your pages.

    1. Robots.txt Configuration

    Your robots.txt file should explicitly allow AI crawlers. Create rules for:

    User-agent: GPTBot
    Allow: /
    
    User-agent: CCBot
    Allow: /
    
    User-agent: Perplexitybot
    Allow: /
    
    User-agent: Googlebot-Extended
    Allow: /
    
    User-agent: *
    Disallow: /admin/
    Disallow: /api/
    Disallow: /private/
    

    If you block specific crawlers, you're preventing your content from being included in their generated responses. For most businesses, allowing these crawlers is beneficial.

    2. Sitemap Configuration

    Maintain an updated XML sitemap with:

    • All important content pages
    • Latest publication date (lastmod) for each page
    • Content priority (priority tags)
    • Language information (hreflang tags for international sites)

    The sitemap should be less than 50MB and contain fewer than 50,000 URLs per file. For larger sites, use sitemap index files.

    3. Canonical Tags

    Every page should have a canonical tag pointing to itself (or to the canonical version if you have duplicate content). This helps crawlers identify the authoritative version of content.

    <link rel="canonical" href="https://example.com/definitive-guide-demand-generation" />
    

    4. Noindex and Nofollow Usage

    Selectively use noindex for:

    • Thin content pages (tag pages, archive pages)
    • Duplicate content that shouldn't rank independently
    • Pages in testing or development

    But do NOT noindex:

    • Your primary content pages
    • Pillar pages and cluster content
    • Pages you want included in AI Overviews

    Use nofollow sparingly. For internal links, it's rarely necessary. Use it for:

    • Links to paid or sponsored content
    • Links to user-generated content you don't endorse
    • Links to low-quality external sites

    5. Meta Robots Tag

    Your <head> should include:

    <meta name="robots" content="index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1">
    

    The snippet and preview settings tell crawlers they can use your content generously in responses.

    Semantic HTML and Structure

    Key Insight

    AI crawlers understand content hierarchy through both semantic HTML tags and heading structure.

    AI crawlers understand content hierarchy through both semantic HTML tags and heading structure. Your content should use:

    6. Proper Heading Hierarchy

    • Every page has exactly one H1 tag (the main title)
    • H2 tags break the page into major sections
    • H3 tags break H2 sections into subsections
    • Headings should be descriptive and follow a logical hierarchy (no skipping from H2 to H4)

    Bad hierarchy:

    <h1>Ultimate Guide to Demand Generation</h1>
    <h2>What is Demand Gen?</h2>
    <h2>Demand Gen Tools</h2>
    <h4>Marketo</h4> <!-- Missing H3 -->
    

    Good hierarchy:

    <h1>Ultimate Guide to Demand Generation</h1>
    <h2>What is Demand Generation?</h2>
    <h3>Demand Gen vs Lead Gen</h3>
    <h2>Demand Generation Tools</h2>
    <h3>Marketo</h3>
    <h4>Marketo Pricing</h4>
    

    7. Semantic HTML Tags

    Use semantic tags to mark up content:

    <article> <!-- Main content wrapper -->
      <header> <!-- Article header -->
        <h1>Title</h1>
        <p>Meta information</p>
      </header>
    
      <nav> <!-- Table of contents -->
        <ul>...</ul>
      </nav>
    
      <section> <!-- Major section -->
        <h2>Section Title</h2>
        <p>Content...</p>
      </section>
    
      <footer> <!-- Article footer -->
        <p>Author bio, related articles</p>
      </footer>
    </article>
    

    8. List Markup for Structured Content

    Use <ul> (unordered) or <ol> (ordered) lists for:

    • Bullet points and numbered lists
    • Feature lists
    • Step-by-step instructions
    • Comparison points
    <ol>
      <li>First step in the process</li>
      <li>Second step</li>
      <li>Third step</li>
    </ol>
    

    LLMs extract list content more cleanly than prose lists that aren't properly marked up.

    9. Table Markup for Comparisons and Data

    Use <table> for structured data:

    <table>
      <thead>
        <tr>
          <th>Feature</th>
          <th>Platform A</th>
          <th>Platform B</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>Price per seat</td>
          <td>$50</td>
          <td>$75</td>
        </tr>
      </tbody>
    </table>
    

    Tables are extracted more accurately by LLMs than prose comparisons.

    10. Emphasis and Strong Tag Usage

    Use <em> for emphasis and <strong> for strong emphasis:

    <p>Marketing automation is <strong>essential</strong> for modern B2B teams.</p>
    

    Don't use CSS to style text as bold or italic; use semantic tags.

    Schema Markup and Structured Data

    Key Insight

    Schema. org markup helps crawlers understand what type of content you're publishing and what it contains.

    Schema Markup and Structured Data — The Technical Checklist: Is Your Website AI-Ready?
    Schema Markup and Structured Data

    Schema.org markup helps crawlers understand what type of content you're publishing and what it contains.

    11. Article Schema

    Every long-form article should include Article schema in JSON-LD format:

    <script type="application/ld+json">
    {
      "@context": "https://schema.org",
      "@type": "Article",
      "headline": "The Technical Checklist: Is Your Website AI-Ready?",
      "description": "20+ technical elements that determine AI readiness",
      "image": "https://example.com/image.jpg",
      "datePublished": "2026-03-15",
      "dateModified": "2026-03-20",
      "author": {
        "@type": "Person",
        "name": "Author Name",
        "url": "https://example.com/authors/author-name"
      },
      "publisher": {
        "@type": "Organization",
        "name": "Your Company",
        "logo": {
          "@type": "ImageObject",
          "url": "https://example.com/logo.png"
        }
      }
    }
    </script>
    

    12. FAQPage Schema

    If your article includes an FAQ section, use FAQPage schema:

    <script type="application/ld+json">
    {
      "@context": "https://schema.org",
      "@type": "FAQPage",
      "mainEntity": [
        {
          "@type": "Question",
          "name": "How do I know if my website is AI-ready?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "Run through this checklist of 20+ technical elements..."
          }
        }
      ]
    }
    </script>
    

    13. BreadcrumbList Schema

    Help AI systems understand your site structure:

    <script type="application/ld+json">
    {
      "@context": "https://schema.org",
      "@type": "BreadcrumbList",
      "itemListElement": [
        {
          "@type": "ListItem",
          "position": 1,
          "name": "Home",
          "item": "https://example.com/"
        },
        {
          "@type": "ListItem",
          "position": 2,
          "name": "Resources",
          "item": "https://example.com/resources"
        },
        {
          "@type": "ListItem",
          "position": 3,
          "name": "Article Title",
          "item": "https://example.com/article-title"
        }
      ]
    }
    </script>
    

    14. HowTo Schema

    For instructional content:

    {
      "@context": "https://schema.org",
      "@type": "HowTo",
      "name": "How to Implement Demand Generation",
      "step": [
        {
          "@type": "HowToStep",
          "name": "Define your target audience",
          "text": "..."
        },
        {
          "@type": "HowToStep",
          "name": "Develop content strategy",
          "text": "..."
        }
      ]
    }
    

    15. Organization Schema

    Your homepage should include Organization schema:

    {
      "@context": "https://schema.org",
      "@type": "Organization",
      "name": "Your Company Name",
      "url": "https://example.com",
      "logo": "https://example.com/logo.png",
      "description": "What your company does",
      "sameAs": [
        "https://linkedin.com/company/...",
        "https://twitter.com/..."
      ],
      "contactPoint": {
        "@type": "ContactPoint",
        "contactType": "Sales",
        "telephone": "+1-XXX-XXX-XXXX"
      }
    }
    

    16. Author Schema

    Create dedicated author pages with author schema:

    {
      "@context": "https://schema.org",
      "@type": "Person",
      "name": "Author Name",
      "jobTitle": "Director of Demand Generation",
      "worksFor": {
        "@type": "Organization",
        "name": "Your Company"
      },
      "url": "https://example.com/authors/author-name",
      "sameAs": [
        "https://linkedin.com/in/..."
      ]
    }
    

    Metadata and Open Graph Tags

    Key Insight

    Your `<head>` should contain explicit metadata that tells AI crawlers about your content.

    Your <head> should contain explicit metadata that tells AI crawlers about your content.

    17. Title and Meta Description

    Every page needs:

    <title>Title | Your Company</title>
    <meta name="description" content="One-line summary of the page, 150-160 characters">
    

    The title should be SEO-optimised. The description should be a natural summary (not keyword-stuffed).

    18. Open Graph Tags

    Help AI systems understand what your content is about:

    <meta property="og:type" content="article">
    <meta property="og:title" content="The Technical Checklist: Is Your Website AI-Ready?">
    <meta property="og:description" content="One-line summary">
    <meta property="og:image" content="https://example.com/image.jpg">
    <meta property="og:url" content="https://example.com/article-url">
    

    19. Twitter Card Tags

    For consistency on social platforms:

    <meta name="twitter:card" content="summary_large_image">
    <meta name="twitter:title" content="Article Title">
    <meta name="twitter:description" content="Summary">
    <meta name="twitter:image" content="https://example.com/image.jpg">
    

    20. Language and Charset Declaration

    Every page should have:

    <meta charset="UTF-8">
    <html lang="en">
    

    For international sites, use hreflang tags to indicate language versions.

    Page Performance and Core Web Vitals

    Key Insight

    AI crawlers assess content quality partially through page performance. Slow pages signal lower quality to ranking and crawling systems.

    AI crawlers assess content quality partially through page performance. Slow pages signal lower quality to ranking and crawling systems.

    21. Core Web Vitals

    Optimise for Google's three Core Web Vitals:

    • Largest Contentful Paint (LCP): First major content should load in < 2.5 seconds
    • Cumulative Layout Shift (CLS): Page shouldn't shift unexpectedly as it loads; target < 0.1
    • First Input Delay (FID): Page should respond to input within 100ms (being replaced by Interaction to Next Paint in 2024)

    22. Page Speed Optimisation

    • Compress images (use modern formats like WebP)
    • Minify CSS and JavaScript
    • Implement lazy loading for images and video
    • Use a CDN to serve content from locations near users
    • Remove render-blocking resources
    • Optimise third-party scripts (analytics, ads)

    23. Mobile Responsiveness

    AI crawlers access your site from various devices and viewport sizes. Ensure:

    • Your site is fully responsive to all screen sizes
    • Text is readable without zooming
    • Buttons and links are touch-friendly (48px minimum)
    • Navigation works on mobile

    Test with Google's Mobile-Friendly Test and across various real devices.

    Content Accessibility and Readability

    Key Insight

    AI crawlers benefit from content that follows accessibility standards (WCAG 2. 1 Level AA).

    AI crawlers benefit from content that follows accessibility standards (WCAG 2.1 Level AA).

    24. Image Alt Text

    Every image should have descriptive alt text:

    <img src="audience-analysis.jpg" alt="Sample audience segmentation showing demand generation targets">
    

    Alt text helps AI crawlers understand what's in images and serves blind and low-vision users.

    25. Color Contrast

    Text should have sufficient contrast against backgrounds (4.5:1 ratio for normal text, 3:1 for large text).

    26. Focus Indicators

    Interactive elements should have visible focus states for keyboard navigation.

    27. Language and Readability Metrics

    AI crawlers can assess readability. Use:

    • Clear sentence structure
    • Short paragraphs (3-5 sentences)
    • Active voice where possible
    • Jargon defined when first introduced

    Aim for 8th-9th grade reading level for B2B content (high school education level). Tools like Hemingway App and Readability Statistics can help.

    AI-Specific Technical Requirements

    Key Insight

    In addition to meta robots tags, you can use HTTP headers: ``` X-Robots-Tag: index, follow, max-snippet:-1, max-image-preview:large ```

    28. X-Robots-Tag HTTP Header

    In addition to meta robots tags, you can use HTTP headers:

    X-Robots-Tag: index, follow, max-snippet:-1, max-image-preview:large
    

    29. Allow AI Crawlers in Robots.txt

    Explicitly allow major AI crawlers:

    • GPTBot (OpenAI)
    • CCBot (Common Crawl)
    • Perplexitybot (Perplexity)
    • Googlebot-Extended (Google's generative AI crawling)
    • ClaudeBot (Anthropic)

    30. Monitor Crawl Patterns

    Use Google Search Console to monitor:

    • Which crawlers are accessing your site
    • How frequently they crawl
    • Any crawl errors
    • Coverage issues (pages that aren't being indexed)

    Look for GoogleExtended and GPTBot requests in your server logs.

    31. Content Freshness

    Include publication and modification dates in:

    • Article schema JSON-LD
    • Your <head> meta tags
    • Visibly on the page (helps both users and crawlers)

    Update modification dates when you revise content significantly.

    Audit and Implementation Checklist

    Crawlability (Priority: Critical)

    • Robots.txt allows GPTBot, Perplexitybot, Googlebot-Extended
    • XML sitemap exists and is updated monthly
    • Canonical tags on all content pages
    • No critical crawl errors in Search Console
    • Important content is not noindex

    Structure (Priority: Critical)

    • Every page has exactly one H1
    • Headings follow logical hierarchy (H1 → H2 → H3)
    • Content uses semantic HTML (<article>, <section>, <nav>)
    • Lists are marked up with <ul> or <ol>
    • Tables use <table> with <thead> and <tbody>

    Schema Markup (Priority: High)

    • Article schema on all long-form content
    • FAQPage schema if you have Q&A sections
    • Organization schema on homepage
    • Author schema on author pages
    • BreadcrumbList schema for site navigation
    • Validate all schema with Google's Structured Data Testing Tool

    Metadata (Priority: High)

    • Every page has unique title tag (50-60 characters)
    • Every page has meta description (150-160 characters)
    • Open Graph tags (og:title, og:description, og:image, og:url)
    • Publication date visible and in schema
    • Last modified date updated when content changes

    Performance (Priority: High)

    • LCP < 2.5 seconds
    • CLS < 0.1
    • Mobile-Friendly Test passes
    • Images optimised and lazy-loaded
    • CSS/JS minified

    Accessibility (Priority: Medium)

    • All images have alt text
    • Text contrast ratio ≥ 4.5:1
    • Interactive elements are keyboard navigable
    • Reading level appropriate (8th-9th grade for B2B)

    Content Quality (Priority: High)

    • No duplicate content without canonical tags
    • Thin content is noindexd or consolidated
    • Promotional content clearly marked
    • Citations for all non-obvious claims
    • Author credentials included

    Frequently Asked Questions

    Priority is 1) your top 50 content pages (your money pages and pillar content), 2) your homepage and main category pages, 3) your blog archive. Secondary pages can follow best practices gradually.
    Probably, but these changes primarily affect AI visibility. Some elements (Core Web Vitals, mobile-friendliness) do impact Google ranking. Others (semantic HTML, schema markup) affect AI inclusion more directly than Google ranking.
    Start with schema markup. Most of your existing content already has proper heading hierarchy and semantic HTML (or can be fixed with search-and-replace). Adding Article schema, FAQPage schema, and Organization schema takes 2-3 hours per 100 pages.
    Only if you explicitly don't want your content included in Google's AI Overviews. For most businesses, allowing it is beneficial.
    For blogs and frequently updated content, daily or weekly. For static content, monthly is fine. Ensure your CMS automatically updates the sitemap when you publish new pages.
    Mostly not. These changes are neutral to positive for Google ranking. The only potential negative would be if you break heading hierarchy or introduce duplicate content issues.
    RW

    Ross Williams

    Ross Williams is the founder of Fortitude Media, specialising in AI visibility and content strategy for B2B companies.

    Share this article

    Related Articles

    AI Optimisation for B2B vs B2C: Key Differences
    Strategy

    AI Optimisation for B2B vs B2C: Key Differences

    B2B and B2C businesses optimise for AI differently. Learn how citation patterns, authority signals, decision complexity, and content types differ between segments.

    Read more
    Building Topic Clusters That AI Understands
    Content Architecture

    Building Topic Clusters That AI Understands

    Topic clusters work for traditional SEO, but AI systems require denser, more explicitly linked clusters. Learn architecture, internal linking, and how LLMs map topical relationships.

    Read more
    How AI Crawlers Differ from Google's Spiders — and Why It Changes Everything
    Technical

    How AI Crawlers Differ from Google's Spiders — and Why It Changes Everything

    GPTBot, ClaudeBot, and PerplexityBot crawl differently than Googlebot. Learn the technical differences, robots.txt implications, and how to optimise for both simultaneously.

    Read more

    See what AI says about your business

    Our free AI audit reveals how visible you are across 150+ AI platforms and what to fix first.

    Get Your Free AI Audit

    Or email [email protected]

    Next up

    What Is E-E-A-T and Why AI Cares About It More Than Google Ever Did

    13 min read
    Ready to get visible?Free AI Audit