Let the Robots In

In July 2025, Cloudflare flipped a switch that changed how 20% of the public web interacts with AI. Every new domain added to their service now blocks all known AI crawlers by default. Site owners who want AI systems to access their content have to explicitly opt in.

I think this is backwards.

The New Default Is Exclusion

Cloudflare’s reasoning isn’t unreasonable. They cite the “crawl-to-referral ratio”—how many pages an AI crawler downloads versus how many clicks it sends back. In June 2025, OpenAI’s ratio was 1,700:1. Anthropic’s was 73,000:1. These systems are consuming vastly more than they’re returning in traditional web traffic.

But traffic isn’t the only form of value.

The Visibility Argument

When someone uses ChatGPT, Perplexity, or Claude to research a topic, the AI draws on sources to construct its answer. If your site is blocked, you’re not in that conversation. If your site is accessible, you might be quoted, cited, or linked.

The numbers are starting to matter. ChatGPT referral traffic to websites grew 25x between early 2024 and late 2025. AI platforms now send more referral traffic than Reddit or LinkedIn. And that traffic converts better—AI search visitors are reportedly converting at 4.4x the rate of traditional organic search visitors.

AI referrals are still small—around 1% of total web traffic—but they’re growing at double-digit rates month over month. This isn’t the end state. It’s the beginning.

The Publisher’s Counterargument

I understand why major publishers are blocking AI crawlers. The New York Times, The Guardian, CNN, and Bloomberg have all implemented blocks. Their argument is straightforward: AI systems are consuming their expensive-to-produce journalism and regurgitating it without compensation. Why should they subsidize their competitors?

This isn’t a straw man. Producing quality journalism is expensive. Newsrooms have been decimated by the shift to digital advertising. If AI systems can summarize an article without sending readers to the original, that’s a real threat to the business model.

Cloudflare has even introduced “pay-per-crawl” as a potential solution—letting sites charge AI companies for access. That’s a reasonable experiment.

But Exclusion Has Costs Too

Here’s my counterargument: if you’re not in the training data, you’re not in the answers.

The New York Times is the newspaper of record. When someone asks an AI “What happened in the 2024 election?” or “What’s the latest on the Ukraine conflict?”, do they want the Times excluded from that answer? Does the Times want to be excluded?

There’s a difference between AI using your content to train models (which happens once and may not benefit you) and AI citing your content at inference time (which happens every time someone asks a relevant question). The llms.txt specification explicitly targets the latter—helping AI systems find and cite your content when users are actively seeking information.

Blocking everything means you’re invisible to a growing class of research tools. That’s not just lost traffic. It’s lost influence.

The Small Site Advantage

For sites like this one—small, independent, not paywalled—the calculus is even clearer. I have no subscription revenue to protect. My goal is to share ideas and be part of conversations. If an AI quotes my post when someone asks about static site hosting or healthcare IT, that’s a win.

I’ve made my site explicitly AI-friendly:

robots.txt that allows all crawlers
llms.txt that provides a structured overview of my content, topics, and post catalog
CC BY 4.0 license that permits sharing and adaptation with attribution

I want to be cited. I want to be in the context window.

How to Make Your Site AI-Friendly

If you’re persuaded—or at least curious—here’s how to open the door.

1. Check Your robots.txt

Your robots.txt file tells crawlers what they can access. A permissive configuration looks like this:

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

If you’re on Cloudflare, check your dashboard settings. New domains have AI blocking enabled by default. You’ll need to change “Block AI Bots” to “Do not block” or “Only block on hostnames with ads.”

2. Create an llms.txt File

The llms.txt specification is a proposed standard for helping AI systems understand your site. Unlike robots.txt (which controls access), llms.txt provides context—a curated map of your most important content.

Place a markdown file at /llms.txt in your site root. The format includes:

An H1 with your site name
A blockquote with a brief description
Sections describing your content, topics, and key pages
Links to your most important resources

Think of it as a README for AI systems. Mine includes a categorized list of every post on the site, with dates and brief descriptions.

3. Make Your Content Parseable

AI systems work better with clean, structured content:

Use semantic HTML (proper headings, lists, paragraphs)
Avoid content locked behind JavaScript that requires interaction
Provide an RSS feed and XML sitemap
Consider structured data (JSON-LD) for key content types

4. State Your Licensing

If you want AI systems to cite and quote your content, say so explicitly. Creative Commons licenses are well-understood. CC BY 4.0 permits any use with attribution—including commercial use and adaptation.

The Bigger Picture

We’re in an awkward transition period. AI companies are consuming content faster than the ecosystem has adapted. Publishers are understandably defensive. Infrastructure providers like Cloudflare are giving site owners tools to protect themselves.

But the long-term question isn’t “How do we block AI?” It’s “How do we participate in AI-mediated discovery?”

Search engines faced similar resistance in the early 2000s. Some publishers blocked Googlebot, worried about “giving away” their content. The publishers who embraced search visibility won. The ones who blocked it became invisible.

AI search won’t replace traditional search overnight. But it’s growing fast, and the users it sends convert better. For most sites—especially small, independent ones—being included is better than being excluded.

Let the robots in. Tell them what you’ve got. Let them quote you.

This site’s robots.txt and llms.txt are publicly available as examples.

The New Default Is Exclusion

The Visibility Argument

The Publisher’s Counterargument

But Exclusion Has Costs Too

The Small Site Advantage

How to Make Your Site AI-Friendly

1. Check Your robots.txt

2. Create an llms.txt File

3. Make Your Content Parseable

4. State Your Licensing

The Bigger Picture

Security Scorecard