The Technical Checklist: Is Your Website AI-Ready?
20+ technical elements determine whether AI crawlers can extract information from your site. Use this checklist to audit your website and fix gaps...

Summary: Most websites were built and optimised for Google's algorithm, not for large language models. AI crawlers have different requirements than Googlebot. They need clearer structure, better semantic markup, and more explicit metadata. This checklist covers 20+ technical elements that determine whether AI systems can extract and recommend your content.
Why Technical Setup Matters for AI
AI crawlers (GPTBot, ClaudeBot, PerplexityBot, and others) face a different challenge than Google. Googlebot needs to understand your site well enough to rank it.
AI crawlers (GPTBot, ClaudeBot, PerplexityBot, and others) face a different challenge than Google. Googlebot needs to understand your site well enough to rank it. AI crawlers need to understand your site well enough to extract useful information and include it in generative responses.
This distinction matters because:
1. AI Crawlers Need Clearer Hierarchy
Google's algorithm can infer topical relevance from various signals. AI crawlers need your content hierarchy to be explicit. If an article about "demand generation" has 7 H2 sections of equal weight, the LLM struggles to determine what's most important. If one section is clearly marked as the core definition and others are clearly marked as subcategories, the extraction is cleaner.
2. AI Systems Value Semantic Markup
Google ranking doesn't depend heavily on semantic HTML (using <article>, <nav>, <section> tags). AI crawlers depend on it. Properly semantic HTML helps AI systems understand which content is primary, which is navigation, and which is sidebar information.
3. AI Crawlers Need Explicit Metadata
Google can infer article dates, author information, and content type from various signals. AI crawlers rely more heavily on explicit metadata in <head> tags. A missing publication date in your markup makes it harder for LLMs to assess freshness.
4. AI Systems Extract and Synthesise Differently
Google's algorithm creates a ranking score. AI systems extract information. This means your content needs to be extractable as distinct, quotable, attributable pieces. Long paragraphs of prose are harder to extract than structured lists or clearly delimited sections.
Crawlability and Indexation
Before any of your content can be included in AI responses, AI crawlers need to be able to find and read your pages.

Before any of your content can be included in AI responses, AI crawlers need to be able to find and read your pages.
1. Robots.txt Configuration
Your robots.txt file should explicitly allow AI crawlers. Create rules for:
User-agent: GPTBot
Allow: /
User-agent: CCBot
Allow: /
User-agent: Perplexitybot
Allow: /
User-agent: Googlebot-Extended
Allow: /
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /private/
If you block specific crawlers, you're preventing your content from being included in their generated responses. For most businesses, allowing these crawlers is beneficial.
2. Sitemap Configuration
Maintain an updated XML sitemap with:
- All important content pages
- Latest publication date (
lastmod) for each page - Content priority (
prioritytags) - Language information (
hreflangtags for international sites)
The sitemap should be less than 50MB and contain fewer than 50,000 URLs per file. For larger sites, use sitemap index files.
3. Canonical Tags
Every page should have a canonical tag pointing to itself (or to the canonical version if you have duplicate content). This helps crawlers identify the authoritative version of content.
<link rel="canonical" href="https://example.com/definitive-guide-demand-generation" />
4. Noindex and Nofollow Usage
Selectively use noindex for:
- Thin content pages (tag pages, archive pages)
- Duplicate content that shouldn't rank independently
- Pages in testing or development
But do NOT noindex:
- Your primary content pages
- Pillar pages and cluster content
- Pages you want included in AI Overviews
Use nofollow sparingly. For internal links, it's rarely necessary. Use it for:
- Links to paid or sponsored content
- Links to user-generated content you don't endorse
- Links to low-quality external sites
5. Meta Robots Tag
Your <head> should include:
<meta name="robots" content="index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1">
The snippet and preview settings tell crawlers they can use your content generously in responses.
Semantic HTML and Structure
AI crawlers understand content hierarchy through both semantic HTML tags and heading structure.
AI crawlers understand content hierarchy through both semantic HTML tags and heading structure. Your content should use:
6. Proper Heading Hierarchy
- Every page has exactly one H1 tag (the main title)
- H2 tags break the page into major sections
- H3 tags break H2 sections into subsections
- Headings should be descriptive and follow a logical hierarchy (no skipping from H2 to H4)
Bad hierarchy:
<h1>Ultimate Guide to Demand Generation</h1>
<h2>What is Demand Gen?</h2>
<h2>Demand Gen Tools</h2>
<h4>Marketo</h4> <!-- Missing H3 -->
Good hierarchy:
<h1>Ultimate Guide to Demand Generation</h1>
<h2>What is Demand Generation?</h2>
<h3>Demand Gen vs Lead Gen</h3>
<h2>Demand Generation Tools</h2>
<h3>Marketo</h3>
<h4>Marketo Pricing</h4>
7. Semantic HTML Tags
Use semantic tags to mark up content:
<article> <!-- Main content wrapper -->
<header> <!-- Article header -->
<h1>Title</h1>
<p>Meta information</p>
</header>
<nav> <!-- Table of contents -->
<ul>...</ul>
</nav>
<section> <!-- Major section -->
<h2>Section Title</h2>
<p>Content...</p>
</section>
<footer> <!-- Article footer -->
<p>Author bio, related articles</p>
</footer>
</article>
8. List Markup for Structured Content
Use <ul> (unordered) or <ol> (ordered) lists for:
- Bullet points and numbered lists
- Feature lists
- Step-by-step instructions
- Comparison points
<ol>
<li>First step in the process</li>
<li>Second step</li>
<li>Third step</li>
</ol>
LLMs extract list content more cleanly than prose lists that aren't properly marked up.
9. Table Markup for Comparisons and Data
Use <table> for structured data:
<table>
<thead>
<tr>
<th>Feature</th>
<th>Platform A</th>
<th>Platform B</th>
</tr>
</thead>
<tbody>
<tr>
<td>Price per seat</td>
<td>$50</td>
<td>$75</td>
</tr>
</tbody>
</table>
Tables are extracted more accurately by LLMs than prose comparisons.
10. Emphasis and Strong Tag Usage
Use <em> for emphasis and <strong> for strong emphasis:
<p>Marketing automation is <strong>essential</strong> for modern B2B teams.</p>
Don't use CSS to style text as bold or italic; use semantic tags.
Schema Markup and Structured Data
Schema. org markup helps crawlers understand what type of content you're publishing and what it contains.

Schema.org markup helps crawlers understand what type of content you're publishing and what it contains.
11. Article Schema
Every long-form article should include Article schema in JSON-LD format:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "The Technical Checklist: Is Your Website AI-Ready?",
"description": "20+ technical elements that determine AI readiness",
"image": "https://example.com/image.jpg",
"datePublished": "2026-03-15",
"dateModified": "2026-03-20",
"author": {
"@type": "Person",
"name": "Author Name",
"url": "https://example.com/authors/author-name"
},
"publisher": {
"@type": "Organization",
"name": "Your Company",
"logo": {
"@type": "ImageObject",
"url": "https://example.com/logo.png"
}
}
}
</script>
12. FAQPage Schema
If your article includes an FAQ section, use FAQPage schema:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "How do I know if my website is AI-ready?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Run through this checklist of 20+ technical elements..."
}
}
]
}
</script>
13. BreadcrumbList Schema
Help AI systems understand your site structure:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [
{
"@type": "ListItem",
"position": 1,
"name": "Home",
"item": "https://example.com/"
},
{
"@type": "ListItem",
"position": 2,
"name": "Resources",
"item": "https://example.com/resources"
},
{
"@type": "ListItem",
"position": 3,
"name": "Article Title",
"item": "https://example.com/article-title"
}
]
}
</script>
14. HowTo Schema
For instructional content:
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "How to Implement Demand Generation",
"step": [
{
"@type": "HowToStep",
"name": "Define your target audience",
"text": "..."
},
{
"@type": "HowToStep",
"name": "Develop content strategy",
"text": "..."
}
]
}
15. Organization Schema
Your homepage should include Organization schema:
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Your Company Name",
"url": "https://example.com",
"logo": "https://example.com/logo.png",
"description": "What your company does",
"sameAs": [
"https://linkedin.com/company/...",
"https://twitter.com/..."
],
"contactPoint": {
"@type": "ContactPoint",
"contactType": "Sales",
"telephone": "+1-XXX-XXX-XXXX"
}
}
16. Author Schema
Create dedicated author pages with author schema:
{
"@context": "https://schema.org",
"@type": "Person",
"name": "Author Name",
"jobTitle": "Director of Demand Generation",
"worksFor": {
"@type": "Organization",
"name": "Your Company"
},
"url": "https://example.com/authors/author-name",
"sameAs": [
"https://linkedin.com/in/..."
]
}
Metadata and Open Graph Tags
Your `<head>` should contain explicit metadata that tells AI crawlers about your content.
Your <head> should contain explicit metadata that tells AI crawlers about your content.
17. Title and Meta Description
Every page needs:
<title>Title | Your Company</title>
<meta name="description" content="One-line summary of the page, 150-160 characters">
The title should be SEO-optimised. The description should be a natural summary (not keyword-stuffed).
18. Open Graph Tags
Help AI systems understand what your content is about:
<meta property="og:type" content="article">
<meta property="og:title" content="The Technical Checklist: Is Your Website AI-Ready?">
<meta property="og:description" content="One-line summary">
<meta property="og:image" content="https://example.com/image.jpg">
<meta property="og:url" content="https://example.com/article-url">
19. Twitter Card Tags
For consistency on social platforms:
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Article Title">
<meta name="twitter:description" content="Summary">
<meta name="twitter:image" content="https://example.com/image.jpg">
20. Language and Charset Declaration
Every page should have:
<meta charset="UTF-8">
<html lang="en">
For international sites, use hreflang tags to indicate language versions.
Page Performance and Core Web Vitals
AI crawlers assess content quality partially through page performance. Slow pages signal lower quality to ranking and crawling systems.
AI crawlers assess content quality partially through page performance. Slow pages signal lower quality to ranking and crawling systems.
21. Core Web Vitals
Optimise for Google's three Core Web Vitals:
- Largest Contentful Paint (LCP): First major content should load in < 2.5 seconds
- Cumulative Layout Shift (CLS): Page shouldn't shift unexpectedly as it loads; target < 0.1
- First Input Delay (FID): Page should respond to input within 100ms (being replaced by Interaction to Next Paint in 2024)
22. Page Speed Optimisation
- Compress images (use modern formats like WebP)
- Minify CSS and JavaScript
- Implement lazy loading for images and video
- Use a CDN to serve content from locations near users
- Remove render-blocking resources
- Optimise third-party scripts (analytics, ads)
23. Mobile Responsiveness
AI crawlers access your site from various devices and viewport sizes. Ensure:
- Your site is fully responsive to all screen sizes
- Text is readable without zooming
- Buttons and links are touch-friendly (48px minimum)
- Navigation works on mobile
Test with Google's Mobile-Friendly Test and across various real devices.
Content Accessibility and Readability
AI crawlers benefit from content that follows accessibility standards (WCAG 2. 1 Level AA).
AI crawlers benefit from content that follows accessibility standards (WCAG 2.1 Level AA).
24. Image Alt Text
Every image should have descriptive alt text:
<img src="audience-analysis.jpg" alt="Sample audience segmentation showing demand generation targets">
Alt text helps AI crawlers understand what's in images and serves blind and low-vision users.
25. Color Contrast
Text should have sufficient contrast against backgrounds (4.5:1 ratio for normal text, 3:1 for large text).
26. Focus Indicators
Interactive elements should have visible focus states for keyboard navigation.
27. Language and Readability Metrics
AI crawlers can assess readability. Use:
- Clear sentence structure
- Short paragraphs (3-5 sentences)
- Active voice where possible
- Jargon defined when first introduced
Aim for 8th-9th grade reading level for B2B content (high school education level). Tools like Hemingway App and Readability Statistics can help.
AI-Specific Technical Requirements
In addition to meta robots tags, you can use HTTP headers: ``` X-Robots-Tag: index, follow, max-snippet:-1, max-image-preview:large ```
28. X-Robots-Tag HTTP Header
In addition to meta robots tags, you can use HTTP headers:
X-Robots-Tag: index, follow, max-snippet:-1, max-image-preview:large
29. Allow AI Crawlers in Robots.txt
Explicitly allow major AI crawlers:
GPTBot(OpenAI)CCBot(Common Crawl)Perplexitybot(Perplexity)Googlebot-Extended(Google's generative AI crawling)ClaudeBot(Anthropic)
30. Monitor Crawl Patterns
Use Google Search Console to monitor:
- Which crawlers are accessing your site
- How frequently they crawl
- Any crawl errors
- Coverage issues (pages that aren't being indexed)
Look for GoogleExtended and GPTBot requests in your server logs.
31. Content Freshness
Include publication and modification dates in:
- Article schema JSON-LD
- Your
<head>meta tags - Visibly on the page (helps both users and crawlers)
Update modification dates when you revise content significantly.
Audit and Implementation Checklist
Crawlability (Priority: Critical)
- Robots.txt allows GPTBot, Perplexitybot, Googlebot-Extended
- XML sitemap exists and is updated monthly
- Canonical tags on all content pages
- No critical crawl errors in Search Console
- Important content is not
noindex
Structure (Priority: Critical)
- Every page has exactly one H1
- Headings follow logical hierarchy (H1 → H2 → H3)
- Content uses semantic HTML (
<article>,<section>,<nav>) - Lists are marked up with
<ul>or<ol> - Tables use
<table>with<thead>and<tbody>
Schema Markup (Priority: High)
- Article schema on all long-form content
- FAQPage schema if you have Q&A sections
- Organization schema on homepage
- Author schema on author pages
- BreadcrumbList schema for site navigation
- Validate all schema with Google's Structured Data Testing Tool
Metadata (Priority: High)
- Every page has unique title tag (50-60 characters)
- Every page has meta description (150-160 characters)
- Open Graph tags (og:title, og:description, og:image, og:url)
- Publication date visible and in schema
- Last modified date updated when content changes
Performance (Priority: High)
- LCP < 2.5 seconds
- CLS < 0.1
- Mobile-Friendly Test passes
- Images optimised and lazy-loaded
- CSS/JS minified
Accessibility (Priority: Medium)
- All images have alt text
- Text contrast ratio ≥ 4.5:1
- Interactive elements are keyboard navigable
- Reading level appropriate (8th-9th grade for B2B)
Content Quality (Priority: High)
- No duplicate content without canonical tags
- Thin content is
noindexd or consolidated - Promotional content clearly marked
- Citations for all non-obvious claims
- Author credentials included
Frequently Asked Questions
On this page
Ross Williams
Ross Williams is the founder of Fortitude Media, specialising in AI visibility and content strategy for B2B companies.
Share this article


