Technical Guide

The Technical Checklist: Is Your Website AI-Ready?

RW

Ross Williams10 min readTuesday, 31st March 2026

20+ technical elements determine whether AI crawlers can extract information from your site. Use this checklist to audit your website and fix gaps...

20+ technical elements determine whether AI crawlers can extract information from your site. Use this checklist to audit your website and fix gaps...

Summary: Most websites were built and optimised for Google's algorithm, not for large language models. AI crawlers have different requirements than Googlebot. They need clearer structure, better semantic markup, and more explicit metadata. This checklist covers 20+ technical elements that determine whether AI systems can extract and recommend your content.

Why Technical Setup Matters for AI

Key Insight

AI crawlers (GPTBot, ClaudeBot, PerplexityBot, and others) face a different challenge than Google. Googlebot needs to understand your site well enough to rank it.

AI crawlers (GPTBot, ClaudeBot, PerplexityBot, and others) face a different challenge than Google. Googlebot needs to understand your site well enough to rank it. AI crawlers need to understand your site well enough to extract useful information and include it in generative responses.

This distinction matters because:

1. AI Crawlers Need Clearer Hierarchy

Google's algorithm can infer topical relevance from various signals. AI crawlers need your content hierarchy to be explicit. If an article about "demand generation" has 7 H2 sections of equal weight, the LLM struggles to determine what's most important. If one section is clearly marked as the core definition and others are clearly marked as subcategories, the extraction is cleaner.

2. AI Systems Value Semantic Markup

Google ranking doesn't depend heavily on semantic HTML (using <article>, <nav>, <section> tags). AI crawlers depend on it. Properly semantic HTML helps AI systems understand which content is primary, which is navigation, and which is sidebar information.

3. AI Crawlers Need Explicit Metadata

Google can infer article dates, author information, and content type from various signals. AI crawlers rely more heavily on explicit metadata in <head> tags. A missing publication date in your markup makes it harder for LLMs to assess freshness.

4. AI Systems Extract and Synthesise Differently

Google's algorithm creates a ranking score. AI systems extract information. This means your content needs to be extractable as distinct, quotable, attributable pieces. Long paragraphs of prose are harder to extract than structured lists or clearly delimited sections.

Crawlability and Indexation

Key Insight

Before any of your content can be included in AI responses, AI crawlers need to be able to find and read your pages.

Crawlability and Indexation — The Technical Checklist: Is Your Website AI-Ready? — Crawlability and Indexation

Before any of your content can be included in AI responses, AI crawlers need to be able to find and read your pages.

1. Robots.txt Configuration

Your robots.txt file should explicitly allow AI crawlers. Create rules for:

User-agent: GPTBot
Allow: /

User-agent: CCBot
Allow: /

User-agent: Perplexitybot
Allow: /

User-agent: Googlebot-Extended
Allow: /

User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /private/

If you block specific crawlers, you're preventing your content from being included in their generated responses. For most businesses, allowing these crawlers is beneficial.

2. Sitemap Configuration

Maintain an updated XML sitemap with:

All important content pages
Latest publication date (lastmod) for each page
Content priority (priority tags)
Language information (hreflang tags for international sites)

The sitemap should be less than 50MB and contain fewer than 50,000 URLs per file. For larger sites, use sitemap index files.

3. Canonical Tags

Every page should have a canonical tag pointing to itself (or to the canonical version if you have duplicate content). This helps crawlers identify the authoritative version of content.

<link rel="canonical" href="https://example.com/definitive-guide-demand-generation" />

4. Noindex and Nofollow Usage

Selectively use noindex for:

Thin content pages (tag pages, archive pages)
Duplicate content that shouldn't rank independently
Pages in testing or development

But do NOT noindex:

Your primary content pages
Pillar pages and cluster content
Pages you want included in AI Overviews

Use nofollow sparingly. For internal links, it's rarely necessary. Use it for:

Links to paid or sponsored content
Links to user-generated content you don't endorse
Links to low-quality external sites

5. Meta Robots Tag

Your <head> should include:

<meta name="robots" content="index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1">

The snippet and preview settings tell crawlers they can use your content generously in responses.

Semantic HTML and Structure

Key Insight

AI crawlers understand content hierarchy through both semantic HTML tags and heading structure.

AI crawlers understand content hierarchy through both semantic HTML tags and heading structure. Your content should use:

6. Proper Heading Hierarchy

Every page has exactly one H1 tag (the main title)
H2 tags break the page into major sections
H3 tags break H2 sections into subsections
Headings should be descriptive and follow a logical hierarchy (no skipping from H2 to H4)

Bad hierarchy:

<h1>Ultimate Guide to Demand Generation</h1>
<h2>What is Demand Gen?</h2>
<h2>Demand Gen Tools</h2>
<h4>Marketo</h4> <!-- Missing H3 -->

Good hierarchy:

<h1>Ultimate Guide to Demand Generation</h1>
<h2>What is Demand Generation?</h2>
<h3>Demand Gen vs Lead Gen</h3>
<h2>Demand Generation Tools</h2>
<h3>Marketo</h3>
<h4>Marketo Pricing</h4>

7. Semantic HTML Tags

Use semantic tags to mark up content:

<article> <!-- Main content wrapper -->
  <header> <!-- Article header -->
    <h1>Title</h1>
    <p>Meta information</p>
  </header>

  <nav> <!-- Table of contents -->
    <ul>...</ul>
  </nav>

  <section> <!-- Major section -->
    <h2>Section Title</h2>
    <p>Content...</p>
  </section>

  <footer> <!-- Article footer -->
    <p>Author bio, related articles</p>
  </footer>
</article>

8. List Markup for Structured Content

Use <ul> (unordered) or <ol> (ordered) lists for:

Bullet points and numbered lists
Feature lists
Step-by-step instructions
Comparison points

<ol>
  <li>First step in the process</li>
  <li>Second step</li>
  <li>Third step</li>
</ol>

LLMs extract list content more cleanly than prose lists that aren't properly marked up.

9. Table Markup for Comparisons and Data

Use <table> for structured data:

<table>
  <thead>
    <tr>
      <th>Feature</th>
      <th>Platform A</th>
      <th>Platform B</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Price per seat</td>
      <td>$50</td>
      <td>$75</td>
    </tr>
  </tbody>
</table>

Tables are extracted more accurately by LLMs than prose comparisons.

10. Emphasis and Strong Tag Usage

Use <em> for emphasis and <strong> for strong emphasis:

<p>Marketing automation is <strong>essential</strong> for modern B2B teams.</p>

Don't use CSS to style text as bold or italic; use semantic tags.

Schema Markup and Structured Data

Key Insight

Schema. org markup helps crawlers understand what type of content you're publishing and what it contains.

Schema Markup and Structured Data — The Technical Checklist: Is Your Website AI-Ready? — Schema Markup and Structured Data

Schema.org markup helps crawlers understand what type of content you're publishing and what it contains.

11. Article Schema

Every long-form article should include Article schema in JSON-LD format:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "The Technical Checklist: Is Your Website AI-Ready?",
  "description": "20+ technical elements that determine AI readiness",
  "image": "https://example.com/image.jpg",
  "datePublished": "2026-03-15",
  "dateModified": "2026-03-20",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "url": "https://example.com/authors/author-name"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Your Company",
    "logo": {
      "@type": "ImageObject",
      "url": "https://example.com/logo.png"
    }
  }
}
</script>

12. FAQPage Schema

If your article includes an FAQ section, use FAQPage schema:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "How do I know if my website is AI-ready?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Run through this checklist of 20+ technical elements..."
      }
    }
  ]
}
</script>

13. BreadcrumbList Schema

Help AI systems understand your site structure:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "name": "Home",
      "item": "https://example.com/"
    },
    {
      "@type": "ListItem",
      "position": 2,
      "name": "Resources",
      "item": "https://example.com/resources"
    },
    {
      "@type": "ListItem",
      "position": 3,
      "name": "Article Title",
      "item": "https://example.com/article-title"
    }
  ]
}
</script>

14. HowTo Schema

For instructional content:

{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Implement Demand Generation",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Define your target audience",
      "text": "..."
    },
    {
      "@type": "HowToStep",
      "name": "Develop content strategy",
      "text": "..."
    }
  ]
}

15. Organization Schema

Your homepage should include Organization schema:

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Your Company Name",
  "url": "https://example.com",
  "logo": "https://example.com/logo.png",
  "description": "What your company does",
  "sameAs": [
    "https://linkedin.com/company/...",
    "https://twitter.com/..."
  ],
  "contactPoint": {
    "@type": "ContactPoint",
    "contactType": "Sales",
    "telephone": "+1-XXX-XXX-XXXX"
  }
}

16. Author Schema

Create dedicated author pages with author schema:

{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Author Name",
  "jobTitle": "Director of Demand Generation",
  "worksFor": {
    "@type": "Organization",
    "name": "Your Company"
  },
  "url": "https://example.com/authors/author-name",
  "sameAs": [
    "https://linkedin.com/in/..."
  ]
}

Metadata and Open Graph Tags

Key Insight

Your `<head>` should contain explicit metadata that tells AI crawlers about your content.

Your <head> should contain explicit metadata that tells AI crawlers about your content.

17. Title and Meta Description

Every page needs:

<title>Title | Your Company</title>
<meta name="description" content="One-line summary of the page, 150-160 characters">

The title should be SEO-optimised. The description should be a natural summary (not keyword-stuffed).

18. Open Graph Tags

Help AI systems understand what your content is about:

<meta property="og:type" content="article">
<meta property="og:title" content="The Technical Checklist: Is Your Website AI-Ready?">
<meta property="og:description" content="One-line summary">
<meta property="og:image" content="https://example.com/image.jpg">
<meta property="og:url" content="https://example.com/article-url">

19. Twitter Card Tags

For consistency on social platforms:

<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Article Title">
<meta name="twitter:description" content="Summary">
<meta name="twitter:image" content="https://example.com/image.jpg">

20. Language and Charset Declaration

Every page should have:

<meta charset="UTF-8">
<html lang="en">

For international sites, use hreflang tags to indicate language versions.

Page Performance and Core Web Vitals

Key Insight

AI crawlers assess content quality partially through page performance. Slow pages signal lower quality to ranking and crawling systems.

AI crawlers assess content quality partially through page performance. Slow pages signal lower quality to ranking and crawling systems.

21. Core Web Vitals

Optimise for Google's three Core Web Vitals:

Largest Contentful Paint (LCP): First major content should load in < 2.5 seconds
Cumulative Layout Shift (CLS): Page shouldn't shift unexpectedly as it loads; target < 0.1
First Input Delay (FID): Page should respond to input within 100ms (being replaced by Interaction to Next Paint in 2024)

22. Page Speed Optimisation

Compress images (use modern formats like WebP)
Minify CSS and JavaScript
Implement lazy loading for images and video
Use a CDN to serve content from locations near users
Remove render-blocking resources
Optimise third-party scripts (analytics, ads)

23. Mobile Responsiveness

AI crawlers access your site from various devices and viewport sizes. Ensure:

Your site is fully responsive to all screen sizes
Text is readable without zooming
Buttons and links are touch-friendly (48px minimum)
Navigation works on mobile

Test with Google's Mobile-Friendly Test and across various real devices.

Content Accessibility and Readability

Key Insight

AI crawlers benefit from content that follows accessibility standards (WCAG 2. 1 Level AA).

AI crawlers benefit from content that follows accessibility standards (WCAG 2.1 Level AA).

24. Image Alt Text

Every image should have descriptive alt text:

<img src="audience-analysis.jpg" alt="Sample audience segmentation showing demand generation targets">

Alt text helps AI crawlers understand what's in images and serves blind and low-vision users.

25. Color Contrast

Text should have sufficient contrast against backgrounds (4.5:1 ratio for normal text, 3:1 for large text).

26. Focus Indicators

Interactive elements should have visible focus states for keyboard navigation.

27. Language and Readability Metrics

AI crawlers can assess readability. Use:

Clear sentence structure
Short paragraphs (3-5 sentences)
Active voice where possible
Jargon defined when first introduced

Aim for 8th-9th grade reading level for B2B content (high school education level). Tools like Hemingway App and Readability Statistics can help.

AI-Specific Technical Requirements

Key Insight

In addition to meta robots tags, you can use HTTP headers: ``` X-Robots-Tag: index, follow, max-snippet:-1, max-image-preview:large ```

28. X-Robots-Tag HTTP Header

In addition to meta robots tags, you can use HTTP headers:

X-Robots-Tag: index, follow, max-snippet:-1, max-image-preview:large

29. Allow AI Crawlers in Robots.txt

Explicitly allow major AI crawlers:

GPTBot (OpenAI)
CCBot (Common Crawl)
Perplexitybot (Perplexity)
Googlebot-Extended (Google's generative AI crawling)
ClaudeBot (Anthropic)

30. Monitor Crawl Patterns

Use Google Search Console to monitor:

Which crawlers are accessing your site
How frequently they crawl
Any crawl errors
Coverage issues (pages that aren't being indexed)

Look for GoogleExtended and GPTBot requests in your server logs.

31. Content Freshness

Include publication and modification dates in:

Article schema JSON-LD
Your <head> meta tags
Visibly on the page (helps both users and crawlers)

Update modification dates when you revise content significantly.

Audit and Implementation Checklist

Crawlability (Priority: Critical)

Robots.txt allows GPTBot, Perplexitybot, Googlebot-Extended
XML sitemap exists and is updated monthly
Canonical tags on all content pages
No critical crawl errors in Search Console
Important content is not noindex

Structure (Priority: Critical)

Every page has exactly one H1
Headings follow logical hierarchy (H1 → H2 → H3)
Content uses semantic HTML (<article>, <section>, <nav>)
Lists are marked up with <ul> or <ol>
Tables use <table> with <thead> and <tbody>

Schema Markup (Priority: High)

Article schema on all long-form content
FAQPage schema if you have Q&A sections
Organization schema on homepage
Author schema on author pages
BreadcrumbList schema for site navigation
Validate all schema with Google's Structured Data Testing Tool

Metadata (Priority: High)

Every page has unique title tag (50-60 characters)
Every page has meta description (150-160 characters)
Open Graph tags (og:title, og:description, og:image, og:url)
Publication date visible and in schema
Last modified date updated when content changes

Performance (Priority: High)

LCP < 2.5 seconds
CLS < 0.1
Mobile-Friendly Test passes
Images optimised and lazy-loaded
CSS/JS minified

Accessibility (Priority: Medium)

All images have alt text
Text contrast ratio ≥ 4.5:1
Interactive elements are keyboard navigable
Reading level appropriate (8th-9th grade for B2B)

Content Quality (Priority: High)

No duplicate content without canonical tags
Thin content is noindexd or consolidated
Promotional content clearly marked
Citations for all non-obvious claims
Author credentials included

Frequently Asked Questions

Priority is 1) your top 50 content pages (your money pages and pillar content), 2) your homepage and main category pages, 3) your blog archive. Secondary pages can follow best practices gradually.

Probably, but these changes primarily affect AI visibility. Some elements (Core Web Vitals, mobile-friendliness) do impact Google ranking. Others (semantic HTML, schema markup) affect AI inclusion more directly than Google ranking.

Start with schema markup. Most of your existing content already has proper heading hierarchy and semantic HTML (or can be fixed with search-and-replace). Adding Article schema, FAQPage schema, and Organization schema takes 2-3 hours per 100 pages.

Only if you explicitly don't want your content included in Google's AI Overviews. For most businesses, allowing it is beneficial.

For blogs and frequently updated content, daily or weekly. For static content, monthly is fine. Ensure your CMS automatically updates the sitemap when you publish new pages.

Mostly not. These changes are neutral to positive for Google ranking. The only potential negative would be if you break heading hierarchy or introduce duplicate content issues.

On this page

RW

Ross Williams

Ross Williams is the founder of Fortitude Media, specialising in AI visibility and content strategy for B2B companies.

Share this article

Related Articles

AI Optimisation for B2B vs B2C: Key Differences

AI Optimisation for B2B vs B2C: Key Differences

B2B and B2C businesses optimise for AI differently. Learn how citation patterns, authority signals, decision complexity, and content types differ between segments.

Building Topic Clusters That AI Understands

Content Architecture

Building Topic Clusters That AI Understands

Topic clusters work for traditional SEO, but AI systems require denser, more explicitly linked clusters. Learn architecture, internal linking, and how LLMs map topical relationships.

How AI Crawlers Differ from Google's Spiders — and Why It Changes Everything

How AI Crawlers Differ from Google's Spiders — and Why It Changes Everything

GPTBot, ClaudeBot, and PerplexityBot crawl differently than Googlebot. Learn the technical differences, robots.txt implications, and how to optimise for both simultaneously.

See what AI says about your business

Our free AI audit reveals how visible you are across 150+ AI platforms and what to fix first.

Get Your Free AI Audit

Or email [email protected]

What Is E-E-A-T and Why AI Cares About It More Than Google Ever Did

Ready to get visible?Free AI Audit