AI Penalties, Citation Patterns & Ranking Pipelines

Search Digest Issue #004 — week of 10 Mar 2026, AI penalties and citation pattern analysis

Two pieces from Chris Long's newsletter this week that sit at opposite ends of the AI content question. One is about what happens when you replace your writers with AI entirely — Google found it and erased the sites from the index. The other is about what happens when you use AI thoughtfully to understand how AI Mode actually selects the text it cites. The gap between those two outcomes is not a mystery.

Daniel Shashko's study of the hidden #:~:text fragment in AI Mode results is the most technically interesting GEO piece I have read in a while. Extracting 13,000 citations from over 500 queries and mapping exactly which sentence from which position on the page gets cited — this is the kind of ground-level data that changes how you think about sentence structure and content hierarchy.

Will Scott's Claude Code piece rounded out the week with the clearest practical walkthrough I have seen of using AI tooling to connect GSC, GA4, and Ads into something you can actually ask questions of in real time. The entity authority article from Benu Aggarwal is the foundational piece — it changes how you think about the whole optimisation question — how machines understand what your brand is, not just what your pages say.

Daniel Shashko / Bright Data (via Chris Long)

AI Mode's Hidden #:~:text Fragment Reveals Exactly How It Chooses What to Cite

Daniel Shashko at Bright Data had the insight to extract and analyse the #:~:text fragment that appears in AI Mode citation URLs — the hidden parameter that records exactly which sentence from a page the AI selected. By pulling this from 13,000 citations across 500+ queries, he built a dataset that shows not just what gets cited but where on the page it sits, how long the sentence is, and what structural properties it has. Chris Long, who flagged this in his newsletter, described it as the study he had been waiting for someone to run.

The findings confirm and sharpen the patterns from Kevin Indig's earlier citation research. The mean position of cited content is 34% down the page — consistent with front-loaded content getting more citation weight. The mean cited sentence is 9.8 words. Sentences between 6-20 words account for 92% of all citations. That is not a coincidence — it is a structural preference for concise, declarative statements over long, qualified ones. And 95% of cited pages had at least one structured element: a list, table, or heading.

Key points

  • 71% of AI Mode results included a #:~:text fragment, confirming AI Mode records the specific sentence it drew from for the majority of citations
  • Mean cited content position: 34% down the page — reinforcing that early-page content has significantly higher citation probability
  • Mean cited sentence length: 9.8 words — sentences between 6-20 words account for 92% of all citations
  • 95% of cited pages contained at least one structured element — a list, table, or heading — as the most common structural marker
  • 98% of pages with structured elements used an ordered or unordered list, making lists the single structural element most strongly linked to being cited

Key takeaway

Write sentences that can stand alone as facts. If a sentence requires the previous sentence to make sense, it is not structured for AI citation. Aim for the 6-20 word range for key claims and definitions, place them in the first third of the page, and follow them with a supporting list or structured breakdown. That is the citation-friendly pattern this data points to.

Also worth considering

The structured element finding is one of the strongest signals in this dataset. 95% of cited pages had at least one list, table, or heading — and the list correlation is near-universal. If your content is almost entirely prose without structure, you are competing at a disadvantage regardless of how good the writing is. Structure and content quality are not in tension here; both matter.

What I'm testing

Auditing the sentence length distribution on several pages that rank well but are not appearing in AI Mode citations. Looking for clusters of long, qualified sentences in the first 35% of each page and rewriting them as shorter, standalone factual statements.

Read the full post

Glenn Gabe / Chris Long (LinkedIn)

Google Is Still Penalising Sites That Replace Writers With AI at Scale

Glenn Gabe found this one and Chris Long picked it up in his newsletter. An outlet called Esports Insider published a story a couple of weeks prior that they had fired their writers and replaced them entirely with AI-generated content. They also created fake author avatars to byline the AI output. Google's web spam team noticed. Both of their domains now return roughly 10 results in a site: query — which is the closest Google gets to publicly confirming a manual action has buried a site.

The reason I am including this is not to make the obvious "AI content is bad" argument — that is not what the data shows. There are plenty of sites using AI thoughtfully that are performing well. What this case demonstrates is that aggressive, unchecked AI content at scale — especially combined with fake author signals — is still catching Google's attention and still resulting in severe consequences. The combination of fake bylines and complete writer replacement seems to have been the trigger.

Key points

  • Esports Insider publicly announced it was replacing all writers with AI-generated content and created fake author avatars for the bylines
  • Within two weeks, Google's web spam team had identified the sites — both domains now show approximately 10 results in a site: query
  • The "10 results" signal is consistent with a manual action for site reputation abuse, the same penalty category applied to parasite SEO
  • The combination of fake authorship signals and wholesale AI replacement appears to have been the trigger, rather than AI content alone
  • Google has confirmed site reputation abuse is still actively enforced — this is not a policy that exists on paper without execution

Key takeaway

Fake authorship combined with obvious AI content at scale is still a fast path to a manual action. If you are using AI in your content workflow, the authorship and editorial oversight layers are not optional — they are the difference between a tool and a liability. Transparency about how content is produced matters more now, not less.

Also worth considering

This case is useful evidence for conversations where AI content is being pushed as a pure cost reduction strategy with no editorial involvement. The reputational and indexation risk of getting it wrong is not theoretical. Two domains gone from meaningful index presence to 10 results in two weeks. That is a concrete outcome to reference.

Read the full post

Will Scott / Search Engine Land

How to Turn Claude Code Into Your SEO Command Centre

This came through in a newsletter on March 13 and I read it twice. Will Scott's setup uses Claude Code inside Cursor to pull data from Google Search Console, GA4, and Google Ads into a single interface where you can ask cross-platform questions in seconds rather than switching between tabs and building spreadsheets manually. The initial setup takes about an hour. Monthly data refreshes take five minutes.

The paid-organic gap analysis is where the concrete value shows up. The setup identified 351 keywords where organic performance was already strong but ad spend was continuing on the same queries — clear budget waste that would have taken hours to find manually across disconnected platforms. It also surfaced 41 content gaps with only paid visibility and no organic presence — each of those is a potential content brief that currently costs money to maintain.

Key points

  • Python scripts pull data from Google APIs (GSC, GA4, Ads) into JSON files that Claude can cross-reference simultaneously without tab-switching
  • Setup takes approximately one hour; monthly data refresh takes five minutes to maintain
  • Paid-organic gap analysis found 351 keywords with strong organic performance still receiving ad spend — immediate budget optimisation opportunity
  • 41 content gaps identified with paid visibility only — potential organic content briefs currently being maintained purely through spend
  • Traditional metrics miss AI citation data from ChatGPT, Perplexity, and others — the setup surfaces this as a distinct visibility gap

Key takeaway

The value here is not the individual analyses — you could do them manually. It is the ability to ask compound questions across data sources in seconds rather than hours. "Which of our top-traffic organic pages have never appeared in an AI Mode citation?" becomes a question you can actually answer in the same workflow as your weekly ranking review.

Also worth considering

The paid-organic gap finding alone could justify the setup time in a single session. If you run paid search alongside SEO and have never specifically looked at the overlap, this is the kind of analysis that tends to find meaningful budget waste in most accounts. The tooling makes it repeatable rather than a one-off project.

What I'm testing

Building a scaled-down version of this setup using GSC and GA4 data exports to identify the paid-organic overlap on a site that has not had this analysis done. Interested to see how much of the pattern Scott found holds at a smaller account scale.

Read the full article

Benu Aggarwal / Search Engine Land

Why Entity Authority Is the Foundation of AI Search Visibility

Benu Aggarwal's piece is the kind of article that changes how you think about the optimisation question rather than adding another tactic. The core argument: we have moved past the era of optimising strings (keywords) and even the era of optimising things (entities in isolation). The current era is about optimising interconnected entity ecosystems — how your brand, your products, your people, and your attributes are linked and verified across the web.

The "comprehension budget" concept is the part I keep thinking about. AI models have finite computational resources. When data about your brand is unstructured, inconsistent, or absent from expected reference points, AI systems spend processing budget on verification rather than on recommendation. Brands that make their entity data clean and consistent — through schema markup, Wikidata presence, consistent NAP data — create what Aggarwal calls a "comprehension subsidy" that makes them easier to cite confidently.

Key points

  • Search has evolved from Strings (keywords) through Things (entities) to the current era of interconnected entity ecosystems
  • AI systems operate with a finite "comprehension budget" — inconsistent or absent entity data wastes it, reducing citation confidence
  • Five implementation steps: semantic auditing, precise Schema.org type mapping, nested relationships, external disambiguation (Wikipedia/Wikidata links), and ongoing governance
  • Brands that define executable Schema actions (BuyAction, ReserveAction) will be preferred transaction channels as AI agents move into purchase-mode
  • Schema drift — where markup gradually diverges from actual business reality — is the most common governance failure

Key takeaway

Run a semantic audit before your next round of content investment. If your structured data is incomplete, outdated, or inconsistent with how you appear in Wikipedia, Wikidata, or Google's Knowledge Graph, you are asking AI systems to do verification work every time they consider citing you. Reducing that friction pays dividends across every piece of content you publish.

Also worth considering

The executable schema actions point at a near-future requirement. As AI agents increasingly take purchase actions rather than just providing recommendations, the sites with defined BuyAction and ReserveAction schema will be the ones those agents complete transactions on. Building that infrastructure now rather than retrofitting it later is the kind of investment that tends to look obvious in hindsight.

Read the full article

Jason Barnard / Search Engine Land

The 5 Competitive Gates Hidden Inside Rank and Display

This is part of Barnard's ongoing series on the AI recommendation pipeline and it covers the downstream half — what happens after a page has passed the infrastructure gates and is competing for the actual recommendation. If the 10-gate article from last week was about why content disappears before it is ever considered, this one is about why it loses at the point of consideration.

The five competitive gates — Annotated, Recruited, Grounded, Displayed, Won — are where brand reputation, entity clarity, and content authority actually interact. Being annotated correctly (accurately classified as the relevant entity) and recruited into the consideration set for a given query are the two gates where most brands quietly fail without knowing it. You can rank well in traditional search and still not make it into the AI consideration set if the system has uncertainty about what your brand actually is or does.

Key points

  • The five competitive gates (ARGDW): Annotated, Recruited, Grounded, Displayed, Won — these follow the five infrastructure gates and determine recommendation outcomes
  • Annotation is where the system classifies your content's entity — misclassification here means being recruited for the wrong queries and absent from the right ones
  • Recruitment into the consideration set is where entity clarity and brand reputation combine — unclear or poorly signalled entities are frequently skipped
  • Grounding is where the AI verifies its classification against external reference data — Wikipedia, Wikidata, and Knowledge Graph entries play a direct role here
  • Winning the recommendation requires the highest confidence score at Display — which depends on confidence accumulated across all nine prior gates

Key takeaway

If you are appearing in AI searches inconsistently — sometimes cited, often not, with no clear pattern — the most likely culprit is the Annotation or Recruitment gate. The system is uncertain what you are. Fixing entity clarity through schema, Knowledge Graph alignment, and external reference consistency tends to stabilise citation frequency more than adding new content does.

Also worth considering

The Grounding gate explains why Wikipedia and Wikidata matter for commercial brands. If an AI system cannot verify its entity classification against a trusted external reference, it defaults toward lower-confidence recommendation or skips the citation entirely. For brands without a Wikipedia presence, this is a meaningful gap — and it is more addressable than most people assume.

Read the full article

That is issue #004. The thread running through this week is that AI search optimisation is fundamentally about reducing uncertainty — for the systems deciding what to cite, for the algorithms classifying what you are, and for the tools you use to understand what is working. Every piece this week contributes a different dimension of that same problem.

The Esports Insider story is the cautionary note. The citation study and entity authority pieces are the constructive path forward. If any of this changes how you are approaching structured data or content structure, I would like to hear what you are doing with it.

Free Consultation

Let's Talk


Tell me what you're working on. I'll give you an honest assessment and we'll explore if working together makes sense — no hard sell, just a free, no-obligation call.