The 30-second take. An AI visibility test is a manual diagnostic that checks whether ChatGPT, Gemini, Claude, Perplexity, and Google AI Overviews mention your clinic, describe it accurately, connect it to the right procedure, and cite sources a patient can verify. The simplest version is a 5 × 5 × 3 test: five patient-style prompts, five AI platforms, three runs per prompt. That gives you 75 observations instead of one random answer. The output of that test is your real baseline. Anyone telling you what to do without it is selling.
1. Why a normal Google search does not tell you what you need to know
When most surgeons want to check their visibility, they Google their own name. The clinic site appears first, the GBP shows up on the right rail, a few directory listings load underneath, and the conclusion is always the same. I am visible. The patients are just not converting.
That conclusion is wrong because the patient is not Googling your name. The patient is asking a model who they should trust for a procedure, and the model is producing a synthesized answer with reasoning and sometimes citations. Whether your name appears in that answer is a different question than whether your domain ranks on the SERP.
I learned this while previously running marketing at VIDA Wellness & Beauty Center. We had decent organic rankings on the standard procedure-plus-city queries. We were spending more on paid ads than the inbound was justifying. Then we started asking patients in consultation how they had heard about us, and one in three was saying some version of “ChatGPT mentioned you when I asked about Tijuana surgeons.” So we typed the prompts ourselves. We surfaced reliably for some procedures and never appeared for others. The Google ranking did not predict the AI mention.
The patient is not searching for your website. She is asking who belongs on the shortlist. A clinic can rank on Google and still be invisible in the answer that actually shaped the patient’s decision.
To check that visibility, you have to ask the AI directly.
2. The six-level AI Visibility Score
Before you run the test, get the scoring scale clear. Every AI answer about your clinic falls into one of six levels. We use this scale because “we showed up once” is not a metric.
- Invisible Your name does not appear in the answer at all. The patient never knows you exist for this procedure.
- Misrepresented Your name appears but the model gets your specialty, location, or credentials wrong. Worse than not appearing.
- Mentioned Your name appears as part of a list, but not as a recommendation. The patient sees you as one option among many.
- Recommended uncited The model recommends you but provides no source. The recommendation is real but the model cannot defend it confidently.
- Recommended with thin citation The model recommends you and cites one source. The patient can validate the answer, but the evidence base is fragile.
- Recommended with source depth The model recommends you with multiple independent sources and accurate procedure context. This is full visibility.
Tersefy internal observations across audited Tijuana clinic sites in 2025 found that most clinics score between level 1 and level 3 across most prompts. Some get to level 4 on their main procedure. Almost nobody scores level 5 or 6 across the board. That gap is the work.
If you cannot describe your AI visibility as a number on a six-point scale, you do not have a baseline. You have a feeling.
3. The five tools to test on
Open these five tools, each in its own tab. Use a browser window where you are not personally signed in to Google or any other service that could bias the results, or use a guest profile.
- ChatGPT at chat.openai.com. Use the free version with web search enabled.
- Gemini at gemini.google.com. Free version is fine.
- Claude at claude.ai. Free version is fine. Note that Claude has weaker grounding than the others, so absences here are less informative than presences.
- Perplexity at perplexity.ai. Most citation-dense surface and the one that most often shows you the source URLs directly.
- Google search with AI Overviews at google.com. AI Overviews do not trigger for every query, and their appearance varies by query type, location, and account context. If an AI Overview does not appear, record that too. No overview is also a result.
The reason for testing all five is that they fail differently. In our clinic tests, ChatGPT and Perplexity tend to surface more recent or niche sources. Gemini and Google AI Overviews usually lean harder on sources Google already trusts, which means established directories and large clinic networks. Claude leans cautious on commercial recommendations, especially in healthcare. Treat that as a pattern to investigate, not a universal rule. A clinic that wins ChatGPT and loses Gemini is a different problem from a clinic that wins Gemini and loses ChatGPT. Your fix depends on which surface you are losing.
4. What prompts should clinics run?
Five prompt types. Type each into all five tools. Use them as written, do not soften them, do not lead the model.
| Prompt type | Example | What it reveals |
|---|---|---|
| Best provider | ”Who are the best plastic surgeons in Tijuana for a deep plane facelift?” | Whether your clinic enters the patient shortlist for this procedure. |
| Safety | ”Which clinic in Tijuana is safest for US patients considering tummy tuck?” | Whether AI trusts your safety signals enough to defend the recommendation. |
| Comparison | ”Compare Dr. [Your Name] with other surgeons in Tijuana for rhinoplasty.” | Whether your entity is well-defined enough for direct comparison. |
| Validation | ”Is Dr. [Your Name] in Tijuana a good option for facelift?” | Whether AI can defend a recommendation about you specifically. |
| Education | ”What should I know before getting facelift in Tijuana?” | Whether your content surfaces in informational answers above commercial intent. |
Replace the bracketed placeholders with your real procedure and your real name. Run each prompt three times in separate sessions across each tool. Three runs is the minimum to distinguish signal from random variation. That is the 5 × 5 × 3 test: 5 prompts, 5 platforms, 3 runs per prompt, equals 75 observations.
If you offer multiple procedures, repeat the test for the second and third most common ones. The visibility profile of a clinic is rarely uniform across procedures. We have seen clinics rank at source-depth on facelift and invisible on rhinoplasty in the same week.
Vertical-specific prompt examples
If you are not a plastic surgery clinic, swap the procedure for the one your patients ask about most.
Plastic surgery: best deep plane facelift surgeon in Tijuana / safest tummy tuck clinic in Tijuana for US patients / compare Dr. [Name] for mommy makeover in Tijuana.
Bariatric surgery: safest gastric sleeve surgeon in Tijuana / best bariatric surgeon in Tijuana for US patients / gastric sleeve Tijuana reviews.
Dental: best dental implants clinic in Tijuana / is dental work in Tijuana safe for Americans / compare Tijuana dental clinics for full mouth restoration.
5. What data should you record?
Open a spreadsheet or a notebook. For each prompt-tool combination, record:
- Whether your name appeared at all
- If yes, your position in the answer (first named, third, mid-list, last)
- What the model said about you, copy-pasted verbatim
- Whether the model cited any sources at all, and which ones
- Whether the cited sources were accurate (your real domain, your real CMCPER profile, your real GBP, or generic third parties)
- Which competitors the model named, and what it said about them
- Whether the model got your specialty correct
- Whether the model got your clinic name correct
- Whether the model mentioned a price range, and whether the range matched your actual range
This is tedious. It is also the only way to do it. Skipping the record-keeping is how surgeons end up six months later with a vague sense that AI visibility is a problem and no concrete data on what to actually fix.
The most important field is not position. It is the exact sentence AI used to describe your clinic. That sentence shows which source is doing the work. If the quote sounds like it was lifted from your homepage, your homepage is doing the work and your third-party signals are absent. If the quote sounds like it was lifted from a directory, your homepage is invisible and the directory is the only thing the model can use.
6. How to interpret the results
After you have recorded 75 data points, patterns emerge. Look for one of these five.
| Pattern | What it usually means | First fix |
|---|---|---|
| A. ChatGPT and Perplexity yes, Gemini and AIO no | Your domain is recent or sits below Google’s authority threshold for healthcare | Build trusted third-party citations on sites Google already trusts |
| B. Gemini and AIO yes, ChatGPT and Perplexity no | Your Google footprint is solid but recent AI-grounded sources have not picked you up | Procedure-specific cluster content, llms.txt directives, schema markup |
| C. Wrong information across multiple tools | One stale source is being reused across the model fleet | Find the bad source, correct or remove it, wait 2-4 weeks |
| D. A competitor with weaker credentials appears, you do not | Their digital evidence is more legible than yours | Reverse-engineer the cited sources and match or beat them |
| E. You appear, but with thin citation only | One source carries all your visibility | Add source diversity, even if individual sources are weaker |
Most clinics show some combination of B, D, and E in the same run. That is normal. That is the work.
Wrong information is not a content problem. It is a source problem.
Want this run for you in 24 hours?
The Free AI Visibility Scorecard is the same 5 × 5 × 3 test, run by us, scored against a structured rubric, and delivered as a report you can share with the team. No credit card. We send the results in 24 hours.
7. Common mistakes that ruin the test
The test is simple. The conclusions are where clinics mess it up.
Running it once and stopping. One AI answer is not a baseline. It is a screenshot. AI answers vary across sessions, and a single run gives you a sample of one. Three runs minimum, in separate sessions, before you draw any conclusion. We see clinics excited about a single positive result and discouraged by a single negative one, both equally meaningless.
Running it logged in to your own clinic’s Google account. Personalization will inflate your apparent visibility. Run the test in incognito or a guest profile so the result reflects what a stranger sees, which is what matters.
Asking the model leading questions. “Why is Dr. [Name] the best plastic surgeon in Tijuana?” will produce a positive-sounding answer regardless of your real visibility, because the model is being prompted to confirm a premise. Stick to the neutral prompts in section 4.
Treating absences as more meaningful than presences. If you appear once on Perplexity and never elsewhere, the appearance is the data. The absences tell you which surfaces still need work but the presence proves at least one source recognizes you. Build from there.
Stopping after the test instead of acting on it. The point of the test is to make the next move. If you finish the test and do not change anything in your bio, your procedure pages, or your third-party listings, the test was performance, not diagnostic.
One AI answer is not a baseline. It is a screenshot.
8. What to fix after the test
Once the sheet is filled out, the next move is usually obvious.
| Test result | Priority fix |
|---|---|
| Invisible across most tools | Build the five fundamentals AI visibility needs: clear doctor entity, procedure-specific pages, third-party proof, review specificity, consistent pricing |
| Misrepresented (wrong info) | Find the source feeding the wrong info, fix or remove it, wait 2-4 weeks for propagation |
| Mentioned without recommendation | Make the recommendation defensible: add credential-specific bio, schema markup, hospital affiliation pages |
| Recommended without citation | Build cite-ready third-party sources: medical directories, board pages, hospital affiliations, podcasts |
| Recommended with thin citation | Diversify sources: add two more directories Google and the major LLMs already pull from (Healthgrades, RealSelf, GBP), one earned trade-press mention, one third-party clinical review |
| Recommended with source depth | Track monthly, defend the position, do not assume it holds without ongoing source maintenance |
The cited URL is the clue. Do not ignore it. If the same competitor appears across tools, the answer is in the sources the model is using for them. Study those before you blame the model.
9. When the self-test is enough vs when it is not
The self-test is enough if you are an early-stage clinic with one or two procedures, you can implement fixes yourself, and you mostly need to confirm the work is moving the numbers. Run it monthly, log the results, ship fixes, run it again.
The self-test is not enough if you offer multiple procedures, serve cross-border patients, have multiple surgeons, or need to attribute changes in inbound consultation requests to specific GEO interventions. The self-test tells you what happened. The audit tells you why it happened and which URLs or sources need to change first.
The deeper version, the Cross-Border GEO Audit, runs the same prompts at scale across a defined corpus, scores them against a rubric, maps the source pattern explicitly, and delivers a gap inventory tied to specific URLs to fix. That is what you pay for. The Scorecard is free, the Audit is paid, and the Audit is credited toward GEO Setup if you continue.
Do the self-test first even if you plan to hire someone. It makes the sales call harder to fake.
Ready for the deep diagnostic?
The Cross-Border GEO Audit is $997, delivered in 3 business days, and credited toward GEO Setup if you continue within 30 days. Full prompt corpus, full competitor analysis, source map, gap inventory tied to specific URLs.
Quick answers
Do I need a paid tool to test if ChatGPT mentions my clinic?
No. The 5 × 5 × 3 test in this post uses the free versions of ChatGPT, Gemini, Claude, Perplexity, and Google search. The free tools tell you what the patient sees, which is the only baseline that matters.
Why do my results change every time I ask the same question?
AI answers are non-deterministic. Run each prompt three times in separate sessions and treat the consistent appearances as signal. One AI answer is not a baseline. It is a screenshot.
What if AI gets my credentials wrong?
Wrong information is not a content problem. It is a source problem. Find the source feeding the wrong info, fix it or remove it, and wait two to four weeks for the correction to propagate.
How many prompts are enough for a real test?
Five prompts on five platforms with three runs each. That is 75 observations. Anything less is a sample of one. Anything more before you fix the obvious gaps is procrastination.
Should I test in English or Spanish?
Both, if you serve patients in both languages. The same query in Spanish often returns a different competitor set than the English version. Tijuana clinics serving US patients should win the English query first.
What is the difference between this self-test and the Free AI Visibility Scorecard?
The self-test takes 30 minutes and you do it. The Scorecard is the same test run by us, scored against a structured rubric, and delivered as a report you can share. Same diagnostic, more rigor, same price (zero).
Does this work for clinics outside Tijuana?
Yes. The 5 × 5 × 3 method is geography-agnostic. Replace “Tijuana” with your city in the prompts and run it the same way. Cross-border medical tourism clinics will see the most asymmetric results because their AI answers vary across English and Spanish queries.
What to do next
Run the test this week. If you are reading this on a Sunday morning, you can finish before lunch. Open the five tools, run the five prompts three times each, log everything in a spreadsheet, and read the results against the patterns in section 6.
If the gap looks larger than what you can fix yourself, the Free AI Visibility Scorecard gives you a structured version of the same test in 24 hours. If the test surfaces a gap that requires foundation work, the full guide on AI visibility for Tijuana surgeons covers the five fundamentals AI needs before it can recommend you reliably.
Sources change. Competitors publish. Directories update. Models refresh. The clinics winning the AI second-opinion conversation are the ones who measure monthly and ship fixes between measurements. The ones still blaming Meta and Google a year from now are the ones who never built the baseline in the first place.
Sources
- Aggarwal, P. et al. (2024). GEO: Generative Engine Optimization. Princeton University. arXiv:2311.09735. Reference for the structural-changes-lift-visibility framing in section 1.
- KFF Health Tracking Poll (2025). Health information seeking through AI tools. Reference for the patient-second-opinion behavior framing throughout.
- AirOps (2025). Source-attribution study across LLM-grounded answers. Reference for the source-pattern analysis in section 6.
- Patients Beyond Borders (2024). Medical Tourism Statistics & Facts, Mexico chapter. Reference for cross-border patient context.
- Mexican Council of Plastic, Aesthetic, and Reconstructive Surgery (CMCPER), public certification registry. Referenced in section 8.
- Tersefy internal observations (2025-2026), VIDA Wellness & Beauty Center reference implementation, n = 4 surgeons, 12-month measurement window. Cited inline in section 1 and section 2.
Version history(2 versions)
- v2.02026-05-04Editorial consolidation. Added definitional block, named the 5 × 5 × 3 method, converted prompts/patterns/fixes to tables, removed unsourced 90% claim, added 2 pull-quotes, restructured CTAs and Quick answers.
- v1.02026-05-04Initial publication of the 30-minute AI visibility self-test for clinics.