GEO Has a Measurement Problem You May Not Know About

Everyone is talking about optimizing content for AI. Fewer people are asking a more basic question: Is AI even reading it?

That gap between investing in GEO and having any way to measure its results is the quiet problem sitting underneath most content strategies right now. And it’s more fixable than most marketers realize.

Your analytics aren’t broken. They just weren’t built for this.

Picture a normal Monday morning. You open your analytics dashboard. Sessions look fine, nothing unusual. What the dashboard didn’t show you is that over the weekend, GPTBot, ClaudeBot, and PerplexityBot each visited dozens (possibly hundreds) of your pages. They read your content, processed it, and moved on. No session was recorded. No visit was logged. As far as your tools are concerned, it didn’t happen.

This isn’t a bug. It’s a structural limitation.

Standard analytics tools like Google Analytics run as JavaScript in the user’s browser. AI crawlers don’t use a browser. They fetch your page content directly at the server level and leave — no script executes, no session fires, nothing gets counted. The tools we’ve relied on for 15 years were built to track humans, and AI crawlers simply don’t behave like humans.

The loop that search had and GEO doesn’t (yet)

With Google, the measurement loop was always closed. Google crawls your page, indexes it, ranks it, and sends you traffic. You can see every step of that chain in Search Console and Analytics. You optimize, you measure, you adjust.

AI crawlers break that loop. When GPTBot visits your page, it might be training a model, populating a RAG system, or feeding a future AI answer, none of which necessarily sends a visitor back to your site. Cloudflare data show a widening gap between the amount of content AI bots consume and the amount of traffic they return. The consumption is real. The referral, for now, often isn’t.

This means you can publish a well-structured, thoroughly optimized piece of content and have no idea whether any AI system has noticed it. You’re essentially broadcasting without a signal meter.

What you can actually measure

The good news is that this is changing. Tools are now emerging that give marketers a real window into AI crawler activity, and some of the most accessible ones are free.

Here’s what you can start tracking:

Which AI systems are crawling your site: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, and others, all of which identify themselves when they visit
Which pages they’re accessing, and how often
Why they’re crawling: AI crawlers broadly fall into three purposes: training data collection, search/retrieval (to power AI answers), and user-triggered fetches (when a user directly asks an AI to look something up)
Whether those crawlers ever send traffic back: the crawl-to-referral ratio

One important caveat to set realistic expectations: a crawl is not a citation. Seeing GPTBot on your pages means your content is being evaluated — it does not confirm it was used in an AI response. Think of it as the earliest observable signal in the chain, not the end result. That distinction matters when reporting this internally.

How to get this data

There’s a spectrum of options here, ranging from zero setup to “talk to your infrastructure team.”

The easiest starting point: Microsoft Clarity (WordPress plugin)

If your site runs on WordPress, this is the most straightforward path. The latest version of the Microsoft Clarity plugin includes an AI Bot Activity dashboard automatically; no server configuration, no CDN setup required. It’s free, and the interface will be familiar to anyone who has used Clarity for heatmaps or session recordings.

The dashboard shows crawler volume by operator, which pages are being accessed, and the share of your total traffic that comes from AI bots rather than humans. For most marketing teams, this is enough to get started.

The more robust option: CDN-based integration

Clarity’s AI Bot Activity feature (and Cloudflare’s own bot analytics) can pull data from server-side logs when connected to a CDN. This is more accurate and more detailed than the WordPress plugin approach, because it captures activity at the infrastructure level before any application logic runs.

The most accessible CDN for this is Cloudflare, which a large share of websites already use or can adopt. The setup involves routing your DNS through Cloudflare (a meaningful step, but a one-time one), and it comes with other benefits (performance, security, and DDoS protection). Other supported options include Fastly, Azure Front Door, and Akamai. Note that CDN-level log processing can introduce additional costs depending on your provider and traffic volume, so it’s worth checking before you commit.

For most teams without an existing CDN setup, the honest recommendation is: start with the Clarity WordPress plugin to validate that this data is worth caring about, then consider a CDN integration if you want more depth.

What about server log analysis?

It’s technically possible. AI crawlers identify themselves via user-agent strings in your server access logs, and with enough time, you could parse that data manually. In practice, it’s slow, messy, and requires ongoing maintenance. Unless your team already has a log management platform like Datadog or Elastic in place, it’s not worth the effort when better options exist.

Paid AI visibility platforms

A growing category of dedicated tools, including Profound, Promptmonitor, and others, combines crawler analytics with citation and mention tracking across AI platforms. These give a more complete picture of the full chain from “AI crawled your content” to “AI mentioned your brand in a response.” They’re worth knowing about, especially as this space matures, but they’re oriented toward teams with dedicated analytics resources and corresponding budgets.

What to do with the data

Once you have visibility into crawler activity, a few immediate uses become available:

Cross-reference your highest-crawled pages with your GEO content investments. Are AI systems finding the content you optimized for them, or are they spending time on pages you didn’t prioritize? That gap is a useful diagnostic.

Look at the crawl purpose. Training-oriented crawls and retrieval-oriented crawls are different things. A page crawled repeatedly by a retrieval bot is a stronger signal of potential AI answer inclusion than one scraped for training data.

Use it as a baseline. Right now, most marketing teams have no AI crawler data at all. Getting even a basic month of data gives you a reference point to measure against, which is more than most of your competitors have.

The early-mover window

Most marketing teams running GEO strategies today are doing so without any measurement layer underneath them. The tools to fix that exist, some of them are free, and almost nobody in marketing is using them for this purpose yet.

Getting AI crawler visibility set up now is a bit like installing Google Analytics before your competitors thought to ask where the traffic came from. The data is there. You just have to decide to look at it and find a tool that can make it accessible.

Also read:

What “doing SEO” actually means
Markdown for AI Agents – The New SEO?