Ghost Citations, the AI Slop Loop & What's Driving 69 Million Crawler Visits

Search Digest Marc Levy Week of 27 Apr 2026 Issue #009

The ghost citation finding from Kevin Indig is the one that changes how you think about GEO measurement. His analysis of 3,981 domains across 115 prompts, 14 countries, and 4 AI engines finds that 61.7% of LLM citations are ghost citations, meaning the AI links to your page but never says your name in the answer. Being cited and being known are not the same thing. The split between ChatGPT (cites 87%, names only 20.7%) and Gemini (cites 21.4%, names 83.7%) adds another layer: your visibility strategy has to account for which engine your audience uses, not just whether you appear at all.

The Duda crawler study connects directly to this. Of 68.9 million AI crawler visits in February 2026, OpenAI accounted for 81%, Anthropic 16.6%, and Google just 0.6%. More than half of that crawling is User Fetch, meaning real-time content retrieval for live answers, not training or discovery. Sites with structured data integrations like Yext and Google Business Profile were crawled at rates nearly 40 percentage points higher than those without. The relationship between AI crawler frequency and human traffic (crawled sites average 3.2x more sessions) suggests that getting into the AI retrieval loop has downstream effects on organic visibility too.

Lily Ray's AI Slop Loop piece adds the quality dimension. When AI systems treat citation volume as a proxy for accuracy, fabrications that get amplified across enough AI-generated content pipelines become treated as verified facts. That is the structural problem behind Google AI Overviews being technically correct but ungrounded in 56% of cases. The March core update data from SE Ranking and the bottom-of-funnel content shift round out the picture on the traditional search side: the update was nearly twice as disruptive as December's, and the content that is holding up best is the stuff AI Overviews rarely touch.

Kevin Indig / Growth Memo

The Ghost Citation Problem

Kevin Indig defines a ghost citation as what happens when an AI system uses your content, provides a source link, but never mentions your brand name in the answer text. Analysis of 3,981 domains across 115 prompts, 14 countries, and 4 AI engines finds that 74.9% of domains received citations, but only 38.3% were mentioned by name. Of all citations in the dataset, 61.7% were ghost citations: linked, anonymous, and invisible to the user unless they clicked through to check. Only 13.2% of appearances resulted in both a citation and a mention.

The behaviour varies sharply by AI engine. ChatGPT cites 87% of the time but names brands in just 20.7% of cases. Gemini does the opposite: it names brands in 83.7% of cases but only cites 21.4%. The content type matters too. Comparative and evaluative content, pages that explicitly compare options or recommend one over another, generates brand mentions. Informational content feeds AI engines anonymously. The geographic variation is also notable: mention rates range from 50% in India and Sweden down to 18-22% in Italy, Brazil, and the Netherlands.

Key points

61.7% of LLM citations are ghost citations: the AI links to the page but never names the brand in its answer
Only 13.2% of appearances result in both a citation link and a brand name mention
ChatGPT cites 87% of the time but names brands in only 20.7% of cases
Gemini names brands in 83.7% of cases but cites only 21.4% of the time
Comparative and evaluative content generates mentions; informational content earns citations without attribution
Dataset: 3,981 domains, 115 prompts, 14 countries, 4 AI engines

Key takeaway

Being cited by AI is not the same as being known. Getting linked without being named means users receive your information without knowing who provided it. Shifting content toward comparison, evaluation, and recommendation formats, where AI naturally attributes the source by name, is the practical way to close the gap between citation rate and actual brand visibility.

Also worth considering

The Gemini and ChatGPT split makes the optimisation problem LLM-specific rather than universal. A strategy focused entirely on earning ChatGPT citations may be building a large ghost citation footprint. Knowing which AI engines your audience uses most, and how each one handles brand attribution, makes GEO work more targeted rather than treating all AI visibility as equivalent.

What I'm testing

Querying ChatGPT and Gemini directly for target questions and noting whether citations include brand name mentions or just links. Where ghost citations appear, the page is likely informational. Testing whether adding explicit comparison statements and recommendation language to those pages shifts them from anonymously cited to named in the answer.

Read the full article

Roger Montti / Search Engine Journal

68 Million AI Crawler Visits Show What Drives AI Search Visibility

Duda analysed 858,457 sites on its platform and recorded 68.9 million AI crawler visits in February 2026. OpenAI accounts for 81% of that total, Anthropic 16.6%, Perplexity 1.8%, and Google 0.6%. The purpose of that crawling has also shifted: 56.9% is now User Fetch, meaning real-time content retrieval to generate live answers, rather than training (28.8%) or content discovery (14.3%). ChatGPT is responsible for nearly all of the User Fetch traffic. The majority of AI crawling is happening to serve active users right now, not to build future models.

The data on what predicts crawling is the most directly actionable part. Sites with a Yext integration have a 97.1% AI crawl rate compared to 58% without. Sites with a Google Business Profile sync have a 92.8% crawl rate versus 58.9% without. Structured data, external integrations, and content depth all correlate with higher crawl frequency. Sites receiving AI crawler visits average 527.7 human sessions compared to 164.9 for non-crawled sites, a 3.2x difference. The direction of causation is not certain, but the association between AI crawling and human traffic is consistent across the dataset.

Key points

OpenAI accounts for 81% of 68.9 million AI crawler visits in Feb 2026; Anthropic 16.6%, Perplexity 1.8%, Google 0.6%
56.9% of AI crawler activity is User Fetch (real-time answer generation), not training or discovery
59% of 858,457 sites received at least one AI crawler visit in February 2026
Sites with Yext integration: 97.1% crawl rate; sites with Google Business Profile sync: 92.8%
Crawled sites average 527.7 human sessions vs 164.9 for non-crawled sites (3.2x difference)
User Fetch activity is driven almost entirely by ChatGPT

Key takeaway

OpenAI's 81% crawler share confirms that for real-time AI retrieval, ChatGPT is the dominant engine right now. Structured data integrations appear to signal to AI crawlers that a site has reliable, organised information worth retrieving. If a site lacks these integrations and is not receiving AI referral traffic, that is the gap to close before trying more complex GEO tactics.

Also worth considering

Google's 0.6% AI crawler share in this dataset is striking given its overall dominance of web search. Googlebot handles traditional indexing separately, so this figure likely reflects Google's AI retrieval layer specifically. The practical implication: the assumption that Google dominates AI crawler traffic does not hold, and strategies focused on Googlebot optimisation are not automatically covering AI retrieval.

What I'm testing

Checking structured data completeness and third-party integrations on pages that are not showing AI referral traffic in GA4, comparing them against pages that are being actively retrieved. The Duda data suggests integrations are the missing signal. Also watching whether the human traffic and AI crawl correlation holds directionally as an early indicator of which pages are entering the retrieval loop.

Read the full article

Lily Ray / Substack

The AI Slop Loop

Lily Ray names and documents the AI Slop Loop: a three-stage cycle where one AI-generated article invents false details, multiple AI content pipelines scrape and republish the misinformation, and enough citations accumulate that retrieval systems treat repetition as consensus. The mechanism matters because AI systems often rank credibility partly by how many sources corroborate a claim. If the sources are all downstream copies of the same fabrication, repetition gets mistaken for verification. Ray uses a March 2026 core update as a concrete example, where AI-generated articles invented nonexistent algorithmic details that then spread across dozens of other sites within days.

The scale data is sobering. Google's AI Overviews have a 91% accuracy rate, but applied to 5 trillion annual searches, that 9% error rate means tens of millions of wrong answers per hour. 56% of technically correct AI Overview responses were ungrounded, meaning the cited sources did not fully support the information provided. With ChatGPT's free tier representing 94% of its 900 million weekly active users, and AI Overviews reaching 2 billion monthly users, the loop is operating at a scale where even small error rates become enormous in absolute terms. Newer models are improving: GPT-5.4 produces 33% fewer false claims than GPT-5.2, but the structural incentive to republish and amplify rather than verify persists.

Key points

The AI Slop Loop has three stages: fabrication, amplification across AI content pipelines, legitimisation through citation volume
91% accuracy for AI Overviews means tens of millions of errors per hour at 5 trillion annual searches
56% of technically correct AI Overview responses were ungrounded, sources did not fully support the claims made
ChatGPT free tier = 94% of 900 million weekly active users; AI Overviews reach 2 billion monthly users
GPT-5.4 produces 33% fewer false claims than GPT-5.2, but loop mechanics remain
Original research and primary data are the practical defence, since they cannot be traced back to AI fabrications

Key takeaway

The AI slop loop is a direct argument for original research and primary data. Content that relies on republishing information from other sources is more vulnerable to propagating fabrications than content grounded in things you have directly observed, measured, or experienced. The loop rewards content that cannot be reverse-engineered from another AI article.

Also worth considering

The 56% ungrounded figure for AI Overviews is the number worth watching. More than half of AI-summarised answers contain at least one claim the cited sources do not fully support. Users reading AI answers in good faith are regularly encountering information that looks sourced but is not fully grounded. That is a long-term trust issue for AI search that will shape how seriously users treat AI-generated summaries over time.

What I'm testing

Running AI Overviews and ChatGPT answers for target queries and then checking whether the cited sources actually say what the AI claims. Where there is a gap between the source and the summary, that is a flag for adding structured schema claims to anchor the correct interpretation and reduce the chance of a fabricated version crowding it out.

Read the full article

Kristina Frunze / Search Engine Land

Why Bottom-of-Funnel Content Is Winning in AI Search

AI Overviews appear far more often for informational queries than for commercial or transactional ones. Ecommerce and decision-stage searches trigger them much less frequently, which means bottom-of-funnel content is relatively protected from the zero-click effect that has hit top-of-funnel pieces hard. Frunze's recommendation is to shift content allocation to 60-80% bottom- and mid-funnel work, with top-of-funnel content repositioned as topical authority infrastructure, passing link equity to conversion-focused pages rather than driving traffic directly.

The attribution challenge makes this shift harder to measure than it sounds. Bottom-of-funnel content often converts through subsequent direct visits or branded searches after a user discovers the brand through an AI citation. That journey shows up as direct or branded traffic in analytics, hiding the role the AI-cited content played in the decision. Standard attribution models undercount the commercial value of content that earns AI citations even when those citations do not drive a direct click. That means the ROI of decision-stage content is systematically underreported in most analytics setups.

Key points

AI Overviews trigger far more often for informational queries than commercial or transactional ones
Ecommerce searches trigger AI Overviews far less frequently, protecting conversion-stage content from zero-click
Recommended content allocation: 60-80% bottom- and mid-funnel, remainder for topical authority TOFU
TOFU content's primary role shifts to building expertise and passing link equity, not generating direct traffic
AI citations often convert via return visits or branded searches, not direct clicks, hiding attribution in analytics
Standard attribution models undercount bottom-of-funnel content that earns AI citations

Key takeaway

The content that is easiest to produce at volume, informational guides and how-to articles, is also the most exposed to AI zero-click. The content that requires more effort, comparison pages, use cases, problem-specific solutions, is better protected and converts better. That inversion is worth building into content planning directly rather than treating it as an edge case.

Also worth considering

The attribution problem cuts both ways. You may be undervaluing bottom-of-funnel content that earns AI citations if you are only measuring direct click-throughs. Before cutting investment in content that appears cited in AI but drives few direct clicks, check whether there is a corresponding pattern of direct or branded traffic growth over the same period.

What I'm testing

Tracking pages that earn AI citations for decision-stage queries and watching whether there is a lag-correlated rise in direct or branded searches in the weeks after. If the pattern holds, it is evidence that AI-cited BOFU content converts through return visits rather than direct clicks, which changes how those pages should be valued in any reporting model.

Read the full article

Yulia Deda / SE Ranking

March 2026 Core Update Was Far More Volatile Than December's: Over 24% of Top-10 Pages Gone

SE Ranking analysed 100,000 keywords across 20 niches in New York to compare the March 2026 and December 2025 core updates. March was substantially more disruptive across every metric. 79.5% of TOP 3 positions changed, up from 66.8% after December. Over 24% of pages that held TOP 10 positions dropped out of the TOP 100 entirely, nearly double December's 14.7%. Nearly 30% of current TOP 3 results ranked outside the TOP 20 before the March update, compared to 13% after December. The update created significantly more movement in both directions than the December update had.

Recovery rates were poor. Of domains losing TOP 10 positions, only 23.5% regained them during the update period. The March update also interacted with the earlier spam update: 82% of domains that dropped after the spam update remained absent after the core update, suggesting that spam-penalised sites are unlikely to recover through core update fluctuations alone. One consistent factor across both updates: domain age. Sites 15 years or older held a stable 57% share of TOP 10 results in both December and March, indicating that established domain authority remained protective even through a high-volatility update.

Key points

79.5% of TOP 3 positions changed in the March 2026 update, vs 66.8% after December 2025
Over 24% of TOP 10 pages dropped out of the TOP 100, nearly double December's 14.7%
Nearly 30% of current TOP 3 results ranked outside the TOP 20 before March, vs 13% after December
Only 23.5% of domains losing TOP 10 positions regained them during the update period
82% of sites penalised by the March spam update remained absent after the core update
Domains 15+ years old held a stable 57% of TOP 10 positions across both updates

Key takeaway

March was nearly twice as disruptive as December, and recovery rates are poor: fewer than 1 in 4 sites that lost top-10 positions got them back during the update period. If you are doing post-update triage, the data supports a different approach for pages that dropped to positions 11-100 versus those that fell below 100. The first group has recovery potential with content improvements. The second needs a fuller reassessment of the page's quality signals.

Also worth considering

The finding that 82% of spam-penalised sites stayed down through the core update matters for recovery planning. Google is maintaining those penalties across both update types, which suggests the spam and core updates are reinforcing the same quality signals rather than operating independently. Sites hoping to recover spam penalties through core update fluctuations should not expect that to happen.

What I'm testing

Categorising post-update drops by severity: positions 11-20, positions 21-100, and below 100. The SE Ranking data suggests these require different responses. Position 11-20 drops are likely fixable with targeted content improvements. Drops below 100 are closer to a full quality reassessment, and spam-update overlaps suggest checking the technical health of those pages before touching the content.

Read the full article

That is issue #009. The ghost citation finding is the one that changed how I think about measuring GEO. Citation rate is not the same as brand visibility. If 61.7% of citations are anonymous links that never say your name, then optimising for citation volume without also optimising for attribution is building visibility that users cannot trace back to you. The split between how ChatGPT and Gemini handle attribution makes this engine-specific, which is uncomfortable but useful: it means the right format for one engine is not automatically the right format for the other.

The Lily Ray and Duda pieces are two sides of the same structural issue. AI systems are retrieving content at scale for live answers, and the quality filters are not keeping up with the volume. Original data, first-hand observation, and content that cannot be reverse-engineered from an AI article are the practical defences against being either replaced by or contaminated by the slop loop. That sits alongside the March core update finding, where established sites held their ground and intermediary content continued to lose ground. The through-line across all five articles this week is that depth and directness are being rewarded. Content that actually knows something, cites it correctly, and attributes it clearly is what is holding up in both traditional and AI search.

Free Consultation

Let's Talk

Tell me what you're working on. I'll give you an honest assessment and we'll explore if working together makes sense — no hard sell, just a free, no-obligation call to explore what's possible for your business.