Content Quality

    Why AI Penalises Thin Content and How to Fix It

    RW
    Ross Williams14 min readTuesday, 31st March 2026

    LLMs detect thin, duplicated, and padded content at scale. Learn how AI systems identify low-quality content and the remediation framework to improve...

    LLMs detect thin, duplicated, and padded content at scale. Learn how AI systems identify low-quality content and the remediation framework to improve...

    Summary: Large language models can directly assess content quality in ways that Google's algorithm struggled to do for years. AI systems instantly recognise whether content is thin, duplicated, padded with filler, or genuinely substantive. This creates a dramatic shift: low-quality content that might have ranked through traditional SEO tricks is now worthless to AI systems. Understanding what AI considers "thin" and how to remediate your content library is critical for visibility in the AI era.

    What LLMs Consider "Thin Content"

    Key Insight

    Thin content is content that offers little genuine value to readers. For decades, Google struggled to identify thin content algorithmically.

    Thin content is content that offers little genuine value to readers. For decades, Google struggled to identify thin content algorithmically. Google relied on proxies: if a page had few inbound links, it was probably thin. If it had high bounce rates, it was probably thin. If it was short, it might be thin.

    LLMs don't use proxies. They directly assess whether content is substantive or hollow.

    Thin Content Categories That LLMs Penalise

    1. Definitional Content Without Depth

    Thin: "Demand generation is the process of building awareness for a product or service among potential customers. It involves creating content and marketing campaigns to attract leads."

    Substantive: "Demand generation is the systematic process of creating awareness and interest in your product among a target audience. Unlike lead generation, which typically focuses on converting ready-to-buy prospects, demand generation casts a wider net across the consideration funnel. Demand generation campaigns might reach prospects 12 months before they're ready to buy, planting seeds for future conversion. This requires different content types (awareness-focused, educational) and different measurement approaches (brand lift, consideration metrics) than lead generation. The distinction matters because demand generation typically has longer payoff periods (18+ months) while lead generation focuses on 90-day conversion windows."

    The thin version states the definition. The substantive version explains the concept, contrasts it with related concepts, explains the business rationale, and contextualises timeframes.

    LLMs reading the thin version extract minimal unique information. LLMs reading the substantive version extract:

    • A clear definition
    • Distinction from related concepts
    • Appropriate use cases and time horizons
    • Measurement approaches

    2. Listicles Without Analysis

    Thin: "5 Demand Generation Best Practices

    1. Create valuable content
    2. Use multiple channels
    3. Track your metrics
    4. Build partnerships
    5. Optimise continuously"

    Substantive: "5 Demand Generation Best Practices

    1. Create valuable content

      • Most B2B demand generation content fails because it's written for the company's priorities (product features) rather than the prospect's priorities (their problems). Substantive content answers: 'What challenges does this prospect face?' not 'What is our product?'
    2. Use multiple channels

      • Single-channel demand gen (email-only, content-only, events-only) reaches only a fraction of your target audience. Map your target personas and identify where they consume information: LinkedIn, industry publications, podcasts, conferences, etc.

    [Continues with similar depth for each item]"

    The thin version is a bullet list with no context or explanation. LLMs reading it extract very little. The substantive version is a listicle that actually explains why each practice matters.

    3. Content Padding and Filler

    Thin: "Demand generation is an important strategy for B2B companies. Many B2B companies use demand generation. B2B companies of all sizes need demand generation. In today's competitive landscape, demand generation is essential. Let's explore demand generation further..."

    This is the same information repeated, padded with transitional phrases that add no value. LLMs immediately recognise this as padding.

    4. Keyword-Stuffed Content

    Thin: "Demand generation software helps with demand generation. The best demand generation software includes demand generation features. Demand generation platforms offer demand generation solutions..."

    Repetitive keyword usage without new information is a clear signal of thinness.

    5. Generic, Boilerplate Content

    Thin: "Demand generation is a complex topic. Many companies struggle with demand generation. It's important to choose the right demand generation strategy. Working with a demand generation agency can help you implement demand generation. There are many demand generation tools available."

    This could describe any topic. It has no specific information, no insight, no value.

    6. Duplicated or Near-Duplicate Content

    Thin: Multiple pages on your site saying nearly the same thing, with minor variations:

    • "Demand generation strategy"
    • "Guide to demand generation strategy"
    • "Best demand generation strategy"

    All three cover the same ground with minimal variation.

    How LLMs Detect Content Quality

    Key Insight

    LLMs have multiple mechanisms for assessing whether content is thin or substantive.

    How LLMs Detect Content Quality — Why AI Penalises Thin Content and How to Fix It
    How LLMs Detect Content Quality

    LLMs have multiple mechanisms for assessing whether content is thin or substantive.

    1. Information Density Analysis

    LLMs can measure "information density" — how much unique, useful information is present per word.

    High density: "In our analysis of 100 mid-market SaaS companies, demand generation spending averaged 12% of total marketing budget, with ROI ranging from 3:1 to 7:1 based on sales cycle length. Early-stage companies (ARR < $5M) allocated 18% of budget; mature companies (ARR > $50M) allocated 8%."

    Low density: "Demand generation spending varies. Some companies spend more on demand generation. The amount you spend depends on your company and your needs. You should allocate enough budget for demand generation."

    The high-density example provides specific numbers and contextual variation. The low-density example provides no specific information.

    2. Concept Repetition Detection

    LLMs track whether the same concept is repeated without new information added.

    Detected as thin:

    • "Demand generation is important. It's critical to have a demand generation strategy. You need to prioritise demand generation. Demand generation should be part of your marketing plan."

    Detected as substantive:

    • "Demand generation should represent 15-20% of your total marketing budget. Beyond that allocation, other channels (product-led growth, partnerships) may be more efficient. Below that allocation, your brand isn't reaching enough of your target audience in early research stages."

    The second example provides new information with each sentence.

    3. Specificity Assessment

    LLMs detect whether content is specific or generic.

    Specific: "In demand generation for enterprise SaaS (contracts >$100K), LinkedIn campaigns to IT buyers averaged 0.8% CTR and $200 CPC in Q4 2025. Comparable figures for mid-market (contracts $20-100K) show 1.3% CTR and $140 CPC."

    Generic: "You should run campaigns on LinkedIn. LinkedIn campaigns are effective. Different audiences respond differently to LinkedIn campaigns."

    The specific version provides measurable, contextual information. The generic version provides no information beyond "LinkedIn is useful."

    4. Source-Alignment Detection

    LLMs check whether claims are supported by context.

    Unsupported: "Demand generation ROI is typically 8:1. Most companies see significant results from demand generation. Demand generation is proven to work."

    Supported: "Based on our analysis of 150 companies, demand generation typically achieves 4:1 to 8:1 ROI, with the variation depending on three factors: 1) Sales cycle length (longer = lower ROI in year one, higher in year two+), 2) Market maturity (established markets = higher ROI), and 3) Team experience (experienced teams = 2-3x higher ROI in years 1-2)."

    The supported version explains its claims with evidence and context.

    5. Logical Coherence Assessment

    LLMs check whether ideas flow logically.

    Incoherent: "Demand generation is important. Your website should be fast. Content is key to demand gen. SEO and PPC are different. You should use both channels."

    Coherent: "Demand generation succeeds when: 1) You reach the right audience (targeting), 2) With the right message (content clarity), 3) Through the right channels (channel selection), 4) At the right time (timing). Each of these compounds. Poor targeting ruins great content. Great targeting amplifies good content."

    The coherent version presents a logical framework. The incoherent version jumps between disconnected ideas.

    The AI Penalty: Loss of Inclusion

    Key Insight

    How does AI penalise thin content? Unlike Google ranking, where thin content might still rank if it has links, AI systems exclude thin content from recommendations.

    How does AI penalise thin content? Unlike Google ranking, where thin content might still rank if it has links, AI systems exclude thin content from recommendations.

    The Mechanism

    When an LLM generates a response to a query, it:

    1. Retrieves candidate sources (using ranking or relevance signals)
    2. Filters for quality (assessing whether the source has genuine substance)
    3. Ranks by utility (which sources provide the most useful information for answering the query?)
    4. Extracts and synthesises (pull information from the top-ranked sources)

    Thin content fails at step 2 and 3. An LLM might retrieve your page based on keyword relevance, but upon reading it, immediately recognize it as thin and deprioritise it.

    Real-World Impact

    A company that:

    • Ranks position 2 in Google for "demand generation software" (has links, authority)
    • But produces thin, generic content about demand generation software

    ...might:

    • Continue ranking position 2 in Google (links and authority carry ranking)
    • But be included in only 10-20% of ChatGPT responses (content quality filter out)

    Meanwhile, a competitor with:

    • Lower Google ranking (position 5)
    • But high-quality, substantive content

    ...might:

    • Rank position 5 in Google (fewer links)
    • But be included in 60-70% of ChatGPT responses (quality content gets included)

    This creates a paradox: traditional SEO tactics (links, authority) can maintain Google ranking even when content is thin. But AI recommendations reward content quality regardless of traditional SEO authority.

    Common Thin Content Patterns in B2B

    Key Insight

    B2B companies frequently create thin content. Understanding common patterns helps you audit your own content.

    Common Thin Content Patterns in B2B — Why AI Penalises Thin Content and How to Fix It
    Common Thin Content Patterns in B2B

    B2B companies frequently create thin content. Understanding common patterns helps you audit your own content.

    Pattern 1: Feature Lists Masquerading as Content

    Thin content example: "Platform X Features:

    • Multi-channel automation
    • Reporting and analytics
    • Integration with 500+ tools
    • AI-powered optimisation
    • Mobile app
    • 24/7 support"

    This is a feature list, not content. It has no analysis, no context, no help for the reader.

    Pattern 2: Topic-Level Content Without Depth

    Thin content example: "What is demand generation? Demand generation is the process of building awareness among potential customers. Demand generation helps you reach more prospects. You can use demand generation to build your business."

    This is a definition + repetition. No insight, no depth, no framework.

    Pattern 3: Competitive Comparisons Without Analysis

    Thin content example: "Platform A vs Platform B Platform A: Pricing $50/seat, Integrations 300+, Mobile app yes Platform B: Pricing $75/seat, Integrations 200+, Mobile app yes"

    This is a table. There's no analysis of when to choose A vs B, no explanation of what the differences mean for different use cases.

    Pattern 4: Case Studies Without Outcomes

    Thin content example: "Case Study: Company X Company X was in the demand generation space. They wanted to improve their results. They implemented our platform. They saw improvements."

    No metrics, no timeline, no specific outcomes, no learnings.

    Pattern 5: Blog Content to Support PPC

    Thin content example: "Need demand generation software? Try [Your Product]. We offer the best demand generation software. Our platform includes [features]. Get started with [Your Product] today."

    This is an ad, not content. It provides no value to someone researching demand generation.

    The Duplication Problem

    Key Insight

    Content duplication is a specific form of thinness that merits its own section because it's so common in B2B content.

    Content duplication is a specific form of thinness that merits its own section because it's so common in B2B content.

    Why Duplication Happens

    B2B companies often create duplicate or near-duplicate content for:

    • Different audience segments ("SaaS demand generation" vs "Enterprise demand generation")
    • Different marketing channels ("Blog post about demand gen" + "Whitepaper about demand gen")
    • Different product variations ("How to do X with Product A" + "How to do X with Product B")
    • Different keyword variations ("Demand generation strategy" + "Best demand generation strategy")

    Why LLMs Penalise Duplication

    When an LLM encounters duplicate content:

    1. Extraction efficiency drops: LLMs can extract information from the original; duplicates add no new information
    2. Attribution clarity drops: Which source is the "real" source? Duplicates confuse attribution
    3. Update friction increases: If you update one duplicate, you need to update all; duplicates get stale inconsistently
    4. Quality signals decrease: The fact that you created duplication suggests you don't have sufficient substance to fill a unique article

    Duplication Detection

    LLMs detect duplication even when content isn't identical:

    Near-duplicate 1: "Demand generation is the process of building awareness among potential customers through targeted campaigns." Near-duplicate 2: "Building awareness among potential customers through targeted campaigns is what demand generation accomplishes."

    Different wording, same information. LLMs recognise this as duplication.

    Impact on Inclusion

    An LLM generating a response might retrieve 10 potential sources. If 4 of them are duplicates of each other, the LLM:

    • Ranks the original highest
    • May include 1-2 of the duplicates if space allows
    • Likely excludes the others

    If your domain has high duplication, your overall inclusion rate drops. The LLM assumes you don't have diverse, substantive content.

    How to Audit Your Content

    Key Insight

    Before you remediate, you need to understand the scope of the problem.

    Before you remediate, you need to understand the scope of the problem.

    Step 1: Identify Thin Content Manually

    Read your top 50 pieces of content (by traffic or strategic importance). For each, ask:

    • Would I learn something new from this article? (Or is it a definition + repetition?)
    • Does this article provide specific, actionable information? (Or is it generic?)
    • Could I explain this topic to someone else using just this article? (Or would I need other sources?)
    • Does every paragraph add information? (Or is there padding and filler?)

    Score each piece:

    • 5 stars: Substantive, specific, original, valuable
    • 4 stars: Good content with some generic sections
    • 3 stars: Average content with mix of substance and filler
    • 2 stars: Thin content with minimal substance
    • 1 star: Extremely thin, mostly filler and padding

    Step 2: Detect Duplication Programmatically

    Use tools to find duplicate or near-duplicate content:

    1. Exact duplicates: Use Google Search Console to find pages marked as duplicates
    2. Near-duplicates: Use Copyscape or Screaming Frog to find similar content
    3. Semantic duplicates: Use a similarity tool (many SEO platforms have this) to find pages that cover the same topic

    Look for:

    • Blog post + whitepaper + case study on the same topic
    • Content for "Product A" + content for "Product B" that's 80% identical
    • "Beginner's guide" + "Expert guide" that cover the same ground

    Step 3: Identify Content Gaps

    Which topics do you cover thinly that competitors cover substantively?

    1. Run searches for your key topics in ChatGPT
    2. Note what sources it cites
    3. Compare competitor content depth to yours
    4. Identify topics where you're weak

    Step 4: Quantify the Problem

    Estimate:

    • % of your content that scores 1-2 stars (thin content)
    • % of your content that's duplicate or near-duplicate
    • % of your strategic topics that are covered thinly

    This gives you a baseline for improvement.

    Remediation Strategies

    Key Insight

    Once you've identified thin content, you have several remediation options.

    Once you've identified thin content, you have several remediation options.

    Option 1: Consolidate Duplicates

    If you have three articles covering demand generation software — one for SMBs, one for mid-market, one for enterprise — you have two choices:

    Choice A: Merge into one comprehensive guide

    • One article: "Demand generation software: comparing platforms for SMB, mid-market, and enterprise"
    • Sections for each segment
    • One canonical source

    Choice B: Make them complementary, not duplicate

    • "Demand generation software for SMBs: 5 platforms under $50/seat"
    • "Demand generation software for mid-market: evaluating ROI for $20K-50K/month budgets"
    • "Demand generation software for enterprise: multi-tenant, multi-region requirements"

    Each article is now distinct, covering different requirements, different decision criteria. Not duplication; segmentation.

    Option 2: Depth-Add Remediation

    Take thin content and add substance:

    Thin: "5 Ways to Improve Demand Generation" Remediated: "5 Ways to Improve Demand Generation (And Why Most Teams Do Only 1)"

    Thin article had 5 points with single sentence descriptions. Remediated version has:

    • 5 points with 200-word deep dives
    • Analysis of why each is often neglected
    • Real examples and case studies
    • Timeline for implementation
    • Expected impact (quantified)

    Option 3: Topic Cluster Approach

    Replace thin content with a topical cluster:

    Instead of: One thin article "Demand generation ROI"

    Create:

    • Pillar article: "Demand generation ROI: comprehensive measurement framework" (3000 words)
    • Child 1: "How to calculate demand generation attribution" (1500 words)
    • Child 2: "Demand generation ROI by industry" (2000 words)
    • Child 3: "First-year vs multi-year demand generation ROI" (1500 words)

    Each piece is substantive, non-duplicate, and builds on the pillar. LLMs now see a dense, authoritative cluster instead of thin scattered content.

    Option 4: Strategic Deletion

    Some thin content should be deleted, not improved. If you have:

    • Thin content that doesn't rank in Google
    • Thin content that doesn't drive conversions
    • Thin content that duplicates what competitors do better

    ...delete it. Deleting bad content is better than leaving it stale. Redirect the URL to the best alternative content.

    Option 5: Archive and Redirect

    For content that was once useful but is now thin/stale:

    • Mark it as archived ("This article was published in 2020 and is no longer maintained")
    • Redirect traffic to current equivalent content
    • Keep it accessible for historical reference, but don't promote it

    Building a Thin Content Prevention Framework

    Key Insight

    Fixing existing thin content is important, but preventing it in the future is critical.

    Fixing existing thin content is important, but preventing it in the future is critical.

    Framework 1: Content Substance Checklist

    Before publishing, every article should meet:

    • Answers a specific question a prospect would ask (not a vague topic)
    • Provides at least 3 specific, actionable insights
    • Includes concrete examples or data
    • Has clear, distinct sections (not repetitive)
    • Cites sources for claims
    • Differentiates from existing competitor content
    • Is 2,000+ words OR is modular/cluster component with clear purpose
    • Every section adds new information (no padding)

    Framework 2: Substance Scoring

    Develop a simple scoring system for content reviews:

    Information Density (1-5):

    • 1: Mostly padding, minimal information
    • 3: Some new info, some generic
    • 5: High-density, specific, actionable

    Specificity (1-5):

    • 1: Completely generic
    • 3: Some specific examples
    • 5: Specific data, metrics, context

    Originality (1-5):

    • 1: Duplicate of existing content
    • 3: Some unique angle on familiar topic
    • 5: Novel framework or analysis

    Average score must be 4+ to publish.

    Framework 3: Topic Architecture

    Plan content as clusters, not isolated articles:

    Topic: "Demand Generation" → Pillar (3000 words): "Complete guide to demand generation" → Child articles (1500+ words):

    • vs lead generation (comparison)
    • for SaaS specifically (vertical)
    • team structure (how-to)
    • ROI measurement (analysis)
    • tools (comparison)

    Each piece is substantive and non-duplicate because each serves a specific, distinct purpose.

    Framework 4: Update Schedule

    Thin content often results from content being outdated. Implement:

    • Review all content every 12 months
    • Update with fresh examples, data, case studies
    • Consolidate or delete if updates reveal thinness
    • Refresh publication date when substantially updated

    Frequently Asked Questions

    Archiving signals to users that the content is old, but LLMs may still include it if they encounter it. Better to delete and redirect.
    Concise: "Demand generation is building awareness among prospects before they enter active buying mode. Timeframe: 12-24 months. ROI: typically 4:1 to 8:1. Fits best for: companies with 18-month+ sales cycles." This is concise but specific. Every sentence adds information. Thin: "Demand generation is important. You should do demand generation. Many companies find demand generation valuable." This is thin, not concise. There's no new information in subsequent sentences.
    Depends on strategic value. High-traffic but thin content: update and strengthen. Low-traffic, thin content: delete and redirect. Mid-tier: consolidate with better content on the same topic.
    Possibly in the short term (one article instead of three). But long-term, consolidated substantive content will outrank thin duplicates. And for AI inclusion, consolidation helps significantly.
    Content review process with substance checklist. Have someone else read every article and ask: "Would I include this in ChatGPT recommendations?" If not, it needs more work.
    Yes, but carefully. AI (ChatGPT, Claude) can draft content structure and fill in frameworks. But you need to add: - Specific examples and data - Your genuine expertise and perspective - Critical analysis and depth AI-generated content is often thin by default. Use it as a starting point, not final output.
    RW

    Ross Williams

    Ross Williams is the founder of Fortitude Media, specialising in AI visibility and content strategy for B2B companies.

    Share this article

    Related Articles

    AI Optimisation for B2B vs B2C: Key Differences
    Strategy

    AI Optimisation for B2B vs B2C: Key Differences

    B2B and B2C businesses optimise for AI differently. Learn how citation patterns, authority signals, decision complexity, and content types differ between segments.

    Read more
    Building Topic Clusters That AI Understands
    Content Architecture

    Building Topic Clusters That AI Understands

    Topic clusters work for traditional SEO, but AI systems require denser, more explicitly linked clusters. Learn architecture, internal linking, and how LLMs map topical relationships.

    Read more
    How AI Crawlers Differ from Google's Spiders — and Why It Changes Everything
    Technical

    How AI Crawlers Differ from Google's Spiders — and Why It Changes Everything

    GPTBot, ClaudeBot, and PerplexityBot crawl differently than Googlebot. Learn the technical differences, robots.txt implications, and how to optimise for both simultaneously.

    Read more

    See what AI says about your business

    Our free AI audit reveals how visible you are across 150+ AI platforms and what to fix first.

    Get Your Free AI Audit

    Or email [email protected]

    Next up

    AI Optimisation for B2B vs B2C: Key Differences

    11 min read
    Ready to get visible?Free AI Audit