seo SUSTAINABLE ›
Framework ›
Measure
04 — Measure
Is recognition actually increasing?
Track mention rate, entity presence, and visibility over time — with data you can actually trust.
- → Why mention rate is the right metric to track
- → The problem with one-off prompts and screenshots
- → What metrics you can say — and can’t say — with mention rate
- → How long sampling works and why sample size matters
- → The tool that puts this into practice
The problem
We’re measuring AI visibility wrong — and the data that proves it.
Most teams are measuring AI visibility the same way: type a question into ChatGPT, see what comes up, take a screenshot, repeat a few times, call it research. The problem is that AI Search isn’t deterministic. Ask the same question multiple times and you’ll get different answers. Change the phrasing slightly and the results shift again.
Small sample sizes are misleading. Random prompts create bias. One-off screenshots prove nothing.
The shift
Did we appear in this one prompt?
↓
Across N prompt variations with k runs each, what percentage mentioned us?
That framing turns AI visibility from a binary outcome into a distribution — which is exactly what LLMs produce. It gives you a probability, not a position.
The right metric
Mention rate.
If rankings don’t apply, what does? The most defensible metric available right now is mention rate. It’s simple: across N prompt variations with k runs each, what percentage mentioned your brand?
When pollsters want to know who’s winning an election, they don’t ask one person. They sample across geographic diversity and time. AI visibility measurement works the same way.
N × k sampling
Sample size is everything.
Large-scale sampling shows that AI responses vary dramatically — different models, different phrasings, different runs. But top brands still appeared consistently across 55–77% of responses despite all that variation. The metric that matters is visibility percentage across diverse prompts, not ranking position.
Tier 1 — Exploratory
20 prompts × 5 runs = 100 samples. Good for early exploration or pressure-testing a topic.
±10% margin
Tier 2 — Monitoring
40 prompts × 10 runs = 400 samples. Suitable for monthly tracking and internal reporting.
±5% margin
Tier 3 — High-stakes
40 prompts × 30 runs = 1,200+ samples. Decision-grade data for board reporting.
±2% margin
Using the number
What you can and can’t do with mention rate.
You can say
- ✓“We improved from 42% to 52% over Q1”
- ✓“We outperform Competitor X by 1.8×”
- ✓Track month-over-month trends
- ✓Set benchmarks and test interventions
You cannot say
- ✕“We rank #3” — there is no rank
- ✕Panic over one bad screenshot
- ✕Expect the same answer twice
- ✕Treat this like keyword position tracking
One rule that matters more than any of the others: only compare distributions to distributions. Never compare single data points to single data points. If your margin of error is ±5%, a 2-point drop means nothing. A 9-point lift probably does.
Start measuring
The tool that puts this into practice.
Beta v2.0
Mention Rate Tool
Runs structured query sampling across AI platforms and gives you actual data — not vibes, not screenshots, not a handful of prompts. Built on the Signals Over Noise methodology.
The full framework
|
01 Define What do you want to be known for? |
02 Structure Can AI systems understand it? |
03 Reinforce Are the right signals strengthening recognition? |
04 Measure Is recognition actually increasing? ← You are here |