04 — Measure

Is recognition actually increasing?

Track mention rate, entity presence, and visibility over time — with data you can actually trust.

Most teams measure AI visibility the same way: type a question into ChatGPT, see what comes up, take a screenshot, call it research. But AI Search isn’t deterministic — ask the same question twice and you’ll get different answers. Small sample sizes are misleading, random prompts create bias, and one-off screenshots prove nothing. If rankings don’t apply, what does? The most defensible metric available right now is mention rate: across N prompt variations with k runs each, what percentage of responses actually mentioned your brand. That’s what this stage tracks — reliable data instead of screenshots.

→ Why mention rate is the right metric to track
→ The problem with one-off prompts and screenshots
→ What metrics you can say — and can’t say — with mention rate
→ How long sampling works and why sample size matters
→ The tool that puts this into practice

The problem

We’re measuring AI visibility wrong — and the data that proves it.

Most teams are measuring AI visibility the same way: type a question into ChatGPT, see what comes up, take a screenshot, repeat a few times, call it research. The problem is that AI Search isn’t deterministic. Ask the same question multiple times and you’ll get different answers. Change the phrasing slightly and the results shift again.

Small sample sizes are misleading. Random prompts create bias. One-off screenshots prove nothing.

The shift

Did we appear in this one prompt?

↓

Across N prompt variations with k runs each, what percentage mentioned us?

That framing turns AI visibility from a binary outcome into a distribution — which is exactly what LLMs produce. It gives you a probability, not a position.

The right metric

Mention rate.

If rankings don’t apply, what does? The most defensible metric available right now is mention rate. It’s simple: across N prompt variations with k runs each, what percentage mentioned your brand?

When pollsters want to know who’s winning an election, they don’t ask one person. They sample across geographic diversity and time. AI visibility measurement works the same way.

N × k sampling

Sample size is everything.

Large-scale sampling shows that AI responses vary dramatically — different models, different phrasings, different runs. But top brands still appeared consistently across 55–77% of responses despite all that variation. The metric that matters is visibility percentage across diverse prompts, not ranking position.

Tier 1 — Exploratory

20 prompts × 5 runs = 100 samples. Good for early exploration or pressure-testing a topic.

±10% margin

Tier 2 — Monitoring

40 prompts × 10 runs = 400 samples. Suitable for monthly tracking and internal reporting.

±5% margin

Tier 3 — High-stakes

40 prompts × 30 runs = 1,200+ samples. Decision-grade data for board reporting.

±2% margin

Using the number

What you can and can’t do with mention rate.

You can say

✓“We improved from 42% to 52% over Q1”
✓“We outperform Competitor X by 1.8×”
✓Track month-over-month trends
✓Set benchmarks and test interventions

You cannot say

✕“We rank #3” — there is no rank
✕Panic over one bad screenshot
✕Expect the same answer twice
✕Treat this like keyword position tracking

One rule that matters more than any of the others: only compare distributions to distributions. Never compare single data points to single data points. If your margin of error is ±5%, a 2-point drop means nothing. A 9-point lift probably does.

Start measuring

The tool that puts this into practice.

Beta v2.0

Mention Rate Tool

Runs structured query sampling across AI platforms and gives you actual data — not vibes, not screenshots, not a handful of prompts. Built on the Signals Over Noise methodology.

Try the Tool →

The full framework

Define

What do you want to be known for?

Go to Define →

Structure

Can AI systems understand it?

Go to Structure →

Reinforce

Are the right signals strengthening recognition?

Go to Reinforce →

Measure