Why AI Penalises Thin Content and How to Fix It
LLMs detect thin, duplicated, and padded content at scale. Learn how AI systems identify low-quality content and the remediation framework to improve...

Summary: Large language models can directly assess content quality in ways that Google's algorithm struggled to do for years. AI systems instantly recognise whether content is thin, duplicated, padded with filler, or genuinely substantive. This creates a dramatic shift: low-quality content that might have ranked through traditional SEO tricks is now worthless to AI systems. Understanding what AI considers "thin" and how to remediate your content library is critical for visibility in the AI era.
What LLMs Consider "Thin Content"
Thin content is content that offers little genuine value to readers. For decades, Google struggled to identify thin content algorithmically.
Thin content is content that offers little genuine value to readers. For decades, Google struggled to identify thin content algorithmically. Google relied on proxies: if a page had few inbound links, it was probably thin. If it had high bounce rates, it was probably thin. If it was short, it might be thin.
LLMs don't use proxies. They directly assess whether content is substantive or hollow.
Thin Content Categories That LLMs Penalise
1. Definitional Content Without Depth
Thin: "Demand generation is the process of building awareness for a product or service among potential customers. It involves creating content and marketing campaigns to attract leads."
Substantive: "Demand generation is the systematic process of creating awareness and interest in your product among a target audience. Unlike lead generation, which typically focuses on converting ready-to-buy prospects, demand generation casts a wider net across the consideration funnel. Demand generation campaigns might reach prospects 12 months before they're ready to buy, planting seeds for future conversion. This requires different content types (awareness-focused, educational) and different measurement approaches (brand lift, consideration metrics) than lead generation. The distinction matters because demand generation typically has longer payoff periods (18+ months) while lead generation focuses on 90-day conversion windows."
The thin version states the definition. The substantive version explains the concept, contrasts it with related concepts, explains the business rationale, and contextualises timeframes.
LLMs reading the thin version extract minimal unique information. LLMs reading the substantive version extract:
- A clear definition
- Distinction from related concepts
- Appropriate use cases and time horizons
- Measurement approaches
2. Listicles Without Analysis
Thin: "5 Demand Generation Best Practices
- Create valuable content
- Use multiple channels
- Track your metrics
- Build partnerships
- Optimise continuously"
Substantive: "5 Demand Generation Best Practices
-
Create valuable content
- Most B2B demand generation content fails because it's written for the company's priorities (product features) rather than the prospect's priorities (their problems). Substantive content answers: 'What challenges does this prospect face?' not 'What is our product?'
-
Use multiple channels
- Single-channel demand gen (email-only, content-only, events-only) reaches only a fraction of your target audience. Map your target personas and identify where they consume information: LinkedIn, industry publications, podcasts, conferences, etc.
[Continues with similar depth for each item]"
The thin version is a bullet list with no context or explanation. LLMs reading it extract very little. The substantive version is a listicle that actually explains why each practice matters.
3. Content Padding and Filler
Thin: "Demand generation is an important strategy for B2B companies. Many B2B companies use demand generation. B2B companies of all sizes need demand generation. In today's competitive landscape, demand generation is essential. Let's explore demand generation further..."
This is the same information repeated, padded with transitional phrases that add no value. LLMs immediately recognise this as padding.
4. Keyword-Stuffed Content
Thin: "Demand generation software helps with demand generation. The best demand generation software includes demand generation features. Demand generation platforms offer demand generation solutions..."
Repetitive keyword usage without new information is a clear signal of thinness.
5. Generic, Boilerplate Content
Thin: "Demand generation is a complex topic. Many companies struggle with demand generation. It's important to choose the right demand generation strategy. Working with a demand generation agency can help you implement demand generation. There are many demand generation tools available."
This could describe any topic. It has no specific information, no insight, no value.
6. Duplicated or Near-Duplicate Content
Thin: Multiple pages on your site saying nearly the same thing, with minor variations:
- "Demand generation strategy"
- "Guide to demand generation strategy"
- "Best demand generation strategy"
All three cover the same ground with minimal variation.
How LLMs Detect Content Quality
LLMs have multiple mechanisms for assessing whether content is thin or substantive.

LLMs have multiple mechanisms for assessing whether content is thin or substantive.
1. Information Density Analysis
LLMs can measure "information density" — how much unique, useful information is present per word.
High density: "In our analysis of 100 mid-market SaaS companies, demand generation spending averaged 12% of total marketing budget, with ROI ranging from 3:1 to 7:1 based on sales cycle length. Early-stage companies (ARR < $5M) allocated 18% of budget; mature companies (ARR > $50M) allocated 8%."
Low density: "Demand generation spending varies. Some companies spend more on demand generation. The amount you spend depends on your company and your needs. You should allocate enough budget for demand generation."
The high-density example provides specific numbers and contextual variation. The low-density example provides no specific information.
2. Concept Repetition Detection
LLMs track whether the same concept is repeated without new information added.
Detected as thin:
- "Demand generation is important. It's critical to have a demand generation strategy. You need to prioritise demand generation. Demand generation should be part of your marketing plan."
Detected as substantive:
- "Demand generation should represent 15-20% of your total marketing budget. Beyond that allocation, other channels (product-led growth, partnerships) may be more efficient. Below that allocation, your brand isn't reaching enough of your target audience in early research stages."
The second example provides new information with each sentence.
3. Specificity Assessment
LLMs detect whether content is specific or generic.
Specific: "In demand generation for enterprise SaaS (contracts >$100K), LinkedIn campaigns to IT buyers averaged 0.8% CTR and $200 CPC in Q4 2025. Comparable figures for mid-market (contracts $20-100K) show 1.3% CTR and $140 CPC."
Generic: "You should run campaigns on LinkedIn. LinkedIn campaigns are effective. Different audiences respond differently to LinkedIn campaigns."
The specific version provides measurable, contextual information. The generic version provides no information beyond "LinkedIn is useful."
4. Source-Alignment Detection
LLMs check whether claims are supported by context.
Unsupported: "Demand generation ROI is typically 8:1. Most companies see significant results from demand generation. Demand generation is proven to work."
Supported: "Based on our analysis of 150 companies, demand generation typically achieves 4:1 to 8:1 ROI, with the variation depending on three factors: 1) Sales cycle length (longer = lower ROI in year one, higher in year two+), 2) Market maturity (established markets = higher ROI), and 3) Team experience (experienced teams = 2-3x higher ROI in years 1-2)."
The supported version explains its claims with evidence and context.
5. Logical Coherence Assessment
LLMs check whether ideas flow logically.
Incoherent: "Demand generation is important. Your website should be fast. Content is key to demand gen. SEO and PPC are different. You should use both channels."
Coherent: "Demand generation succeeds when: 1) You reach the right audience (targeting), 2) With the right message (content clarity), 3) Through the right channels (channel selection), 4) At the right time (timing). Each of these compounds. Poor targeting ruins great content. Great targeting amplifies good content."
The coherent version presents a logical framework. The incoherent version jumps between disconnected ideas.
The AI Penalty: Loss of Inclusion
How does AI penalise thin content? Unlike Google ranking, where thin content might still rank if it has links, AI systems exclude thin content from recommendations.
How does AI penalise thin content? Unlike Google ranking, where thin content might still rank if it has links, AI systems exclude thin content from recommendations.
The Mechanism
When an LLM generates a response to a query, it:
- Retrieves candidate sources (using ranking or relevance signals)
- Filters for quality (assessing whether the source has genuine substance)
- Ranks by utility (which sources provide the most useful information for answering the query?)
- Extracts and synthesises (pull information from the top-ranked sources)
Thin content fails at step 2 and 3. An LLM might retrieve your page based on keyword relevance, but upon reading it, immediately recognize it as thin and deprioritise it.
Real-World Impact
A company that:
- Ranks position 2 in Google for "demand generation software" (has links, authority)
- But produces thin, generic content about demand generation software
...might:
- Continue ranking position 2 in Google (links and authority carry ranking)
- But be included in only 10-20% of ChatGPT responses (content quality filter out)
Meanwhile, a competitor with:
- Lower Google ranking (position 5)
- But high-quality, substantive content
...might:
- Rank position 5 in Google (fewer links)
- But be included in 60-70% of ChatGPT responses (quality content gets included)
This creates a paradox: traditional SEO tactics (links, authority) can maintain Google ranking even when content is thin. But AI recommendations reward content quality regardless of traditional SEO authority.
Common Thin Content Patterns in B2B
B2B companies frequently create thin content. Understanding common patterns helps you audit your own content.

B2B companies frequently create thin content. Understanding common patterns helps you audit your own content.
Pattern 1: Feature Lists Masquerading as Content
Thin content example: "Platform X Features:
- Multi-channel automation
- Reporting and analytics
- Integration with 500+ tools
- AI-powered optimisation
- Mobile app
- 24/7 support"
This is a feature list, not content. It has no analysis, no context, no help for the reader.
Pattern 2: Topic-Level Content Without Depth
Thin content example: "What is demand generation? Demand generation is the process of building awareness among potential customers. Demand generation helps you reach more prospects. You can use demand generation to build your business."
This is a definition + repetition. No insight, no depth, no framework.
Pattern 3: Competitive Comparisons Without Analysis
Thin content example: "Platform A vs Platform B Platform A: Pricing $50/seat, Integrations 300+, Mobile app yes Platform B: Pricing $75/seat, Integrations 200+, Mobile app yes"
This is a table. There's no analysis of when to choose A vs B, no explanation of what the differences mean for different use cases.
Pattern 4: Case Studies Without Outcomes
Thin content example: "Case Study: Company X Company X was in the demand generation space. They wanted to improve their results. They implemented our platform. They saw improvements."
No metrics, no timeline, no specific outcomes, no learnings.
Pattern 5: Blog Content to Support PPC
Thin content example: "Need demand generation software? Try [Your Product]. We offer the best demand generation software. Our platform includes [features]. Get started with [Your Product] today."
This is an ad, not content. It provides no value to someone researching demand generation.
The Duplication Problem
Content duplication is a specific form of thinness that merits its own section because it's so common in B2B content.
Content duplication is a specific form of thinness that merits its own section because it's so common in B2B content.
Why Duplication Happens
B2B companies often create duplicate or near-duplicate content for:
- Different audience segments ("SaaS demand generation" vs "Enterprise demand generation")
- Different marketing channels ("Blog post about demand gen" + "Whitepaper about demand gen")
- Different product variations ("How to do X with Product A" + "How to do X with Product B")
- Different keyword variations ("Demand generation strategy" + "Best demand generation strategy")
Why LLMs Penalise Duplication
When an LLM encounters duplicate content:
- Extraction efficiency drops: LLMs can extract information from the original; duplicates add no new information
- Attribution clarity drops: Which source is the "real" source? Duplicates confuse attribution
- Update friction increases: If you update one duplicate, you need to update all; duplicates get stale inconsistently
- Quality signals decrease: The fact that you created duplication suggests you don't have sufficient substance to fill a unique article
Duplication Detection
LLMs detect duplication even when content isn't identical:
Near-duplicate 1: "Demand generation is the process of building awareness among potential customers through targeted campaigns." Near-duplicate 2: "Building awareness among potential customers through targeted campaigns is what demand generation accomplishes."
Different wording, same information. LLMs recognise this as duplication.
Impact on Inclusion
An LLM generating a response might retrieve 10 potential sources. If 4 of them are duplicates of each other, the LLM:
- Ranks the original highest
- May include 1-2 of the duplicates if space allows
- Likely excludes the others
If your domain has high duplication, your overall inclusion rate drops. The LLM assumes you don't have diverse, substantive content.
How to Audit Your Content
Before you remediate, you need to understand the scope of the problem.
Before you remediate, you need to understand the scope of the problem.
Step 1: Identify Thin Content Manually
Read your top 50 pieces of content (by traffic or strategic importance). For each, ask:
- Would I learn something new from this article? (Or is it a definition + repetition?)
- Does this article provide specific, actionable information? (Or is it generic?)
- Could I explain this topic to someone else using just this article? (Or would I need other sources?)
- Does every paragraph add information? (Or is there padding and filler?)
Score each piece:
- 5 stars: Substantive, specific, original, valuable
- 4 stars: Good content with some generic sections
- 3 stars: Average content with mix of substance and filler
- 2 stars: Thin content with minimal substance
- 1 star: Extremely thin, mostly filler and padding
Step 2: Detect Duplication Programmatically
Use tools to find duplicate or near-duplicate content:
- Exact duplicates: Use Google Search Console to find pages marked as duplicates
- Near-duplicates: Use Copyscape or Screaming Frog to find similar content
- Semantic duplicates: Use a similarity tool (many SEO platforms have this) to find pages that cover the same topic
Look for:
- Blog post + whitepaper + case study on the same topic
- Content for "Product A" + content for "Product B" that's 80% identical
- "Beginner's guide" + "Expert guide" that cover the same ground
Step 3: Identify Content Gaps
Which topics do you cover thinly that competitors cover substantively?
- Run searches for your key topics in ChatGPT
- Note what sources it cites
- Compare competitor content depth to yours
- Identify topics where you're weak
Step 4: Quantify the Problem
Estimate:
- % of your content that scores 1-2 stars (thin content)
- % of your content that's duplicate or near-duplicate
- % of your strategic topics that are covered thinly
This gives you a baseline for improvement.
Remediation Strategies
Once you've identified thin content, you have several remediation options.
Once you've identified thin content, you have several remediation options.
Option 1: Consolidate Duplicates
If you have three articles covering demand generation software — one for SMBs, one for mid-market, one for enterprise — you have two choices:
Choice A: Merge into one comprehensive guide
- One article: "Demand generation software: comparing platforms for SMB, mid-market, and enterprise"
- Sections for each segment
- One canonical source
Choice B: Make them complementary, not duplicate
- "Demand generation software for SMBs: 5 platforms under $50/seat"
- "Demand generation software for mid-market: evaluating ROI for $20K-50K/month budgets"
- "Demand generation software for enterprise: multi-tenant, multi-region requirements"
Each article is now distinct, covering different requirements, different decision criteria. Not duplication; segmentation.
Option 2: Depth-Add Remediation
Take thin content and add substance:
Thin: "5 Ways to Improve Demand Generation" Remediated: "5 Ways to Improve Demand Generation (And Why Most Teams Do Only 1)"
Thin article had 5 points with single sentence descriptions. Remediated version has:
- 5 points with 200-word deep dives
- Analysis of why each is often neglected
- Real examples and case studies
- Timeline for implementation
- Expected impact (quantified)
Option 3: Topic Cluster Approach
Replace thin content with a topical cluster:
Instead of: One thin article "Demand generation ROI"
Create:
- Pillar article: "Demand generation ROI: comprehensive measurement framework" (3000 words)
- Child 1: "How to calculate demand generation attribution" (1500 words)
- Child 2: "Demand generation ROI by industry" (2000 words)
- Child 3: "First-year vs multi-year demand generation ROI" (1500 words)
Each piece is substantive, non-duplicate, and builds on the pillar. LLMs now see a dense, authoritative cluster instead of thin scattered content.
Option 4: Strategic Deletion
Some thin content should be deleted, not improved. If you have:
- Thin content that doesn't rank in Google
- Thin content that doesn't drive conversions
- Thin content that duplicates what competitors do better
...delete it. Deleting bad content is better than leaving it stale. Redirect the URL to the best alternative content.
Option 5: Archive and Redirect
For content that was once useful but is now thin/stale:
- Mark it as archived ("This article was published in 2020 and is no longer maintained")
- Redirect traffic to current equivalent content
- Keep it accessible for historical reference, but don't promote it
Building a Thin Content Prevention Framework
Fixing existing thin content is important, but preventing it in the future is critical.
Fixing existing thin content is important, but preventing it in the future is critical.
Framework 1: Content Substance Checklist
Before publishing, every article should meet:
- Answers a specific question a prospect would ask (not a vague topic)
- Provides at least 3 specific, actionable insights
- Includes concrete examples or data
- Has clear, distinct sections (not repetitive)
- Cites sources for claims
- Differentiates from existing competitor content
- Is 2,000+ words OR is modular/cluster component with clear purpose
- Every section adds new information (no padding)
Framework 2: Substance Scoring
Develop a simple scoring system for content reviews:
Information Density (1-5):
- 1: Mostly padding, minimal information
- 3: Some new info, some generic
- 5: High-density, specific, actionable
Specificity (1-5):
- 1: Completely generic
- 3: Some specific examples
- 5: Specific data, metrics, context
Originality (1-5):
- 1: Duplicate of existing content
- 3: Some unique angle on familiar topic
- 5: Novel framework or analysis
Average score must be 4+ to publish.
Framework 3: Topic Architecture
Plan content as clusters, not isolated articles:
Topic: "Demand Generation" → Pillar (3000 words): "Complete guide to demand generation" → Child articles (1500+ words):
- vs lead generation (comparison)
- for SaaS specifically (vertical)
- team structure (how-to)
- ROI measurement (analysis)
- tools (comparison)
Each piece is substantive and non-duplicate because each serves a specific, distinct purpose.
Framework 4: Update Schedule
Thin content often results from content being outdated. Implement:
- Review all content every 12 months
- Update with fresh examples, data, case studies
- Consolidate or delete if updates reveal thinness
- Refresh publication date when substantially updated
Frequently Asked Questions
On this page
Ross Williams
Ross Williams is the founder of Fortitude Media, specialising in AI visibility and content strategy for B2B companies.
Share this article


