Search is being rebuilt. ChatGPT, Perplexity, Claude, and Google AI Overviews don’t rank pages — they read them, extract claims, and decide what to cite. We wanted to know: how ready is the internet for that shift?
So we scanned 270 of the most recognized websites in the world across 14 industries — Stripe, Amazon, the New York Times, Klaviyo, OpenAI, Coursera, Reddit, Booking.com, GitHub, and 261 more. We scored each across the six measurable AEO factors: FAQ schema, HowTo schema, semantic HTML, llms.txt presence, robots.txt access for AI crawlers, and quick-answer blocks.
The average website scored 55/100. Only 7.9% scored above 70. The single most-failed factor: FAQ schema, missing on 97.6% of sites.
Methodology
We built a free scanner at aeoscore.io that runs 6 checks against any URL. For this study, we scanned the homepage of 270 domains across 14 industries: SaaS B2B, SaaS B2C, developer tools, AI companies, media, marketing blogs, e-commerce, fintech, travel, healthcare, education, fashion, top Y Combinator portfolio companies, and indie SaaS.
Each site receives a 0-100 score weighted across the six factors: Quick Answer Blocks (20%), FAQ Schema (20%), Semantic HTML (20%), llms.txt (15%), AI Bot Access via robots.txt (15%), and HowTo Schema (10%). 7 sites failed to scan due to authentication walls or fetch errors and were excluded. The remaining 267 form our dataset.
One important caveat: this measures structural AEO readiness, not whether a site is actually cited by AI engines today. A site can score 100 and still not be cited (citations also depend on authority, content depth, and freshness). But sites scoring below 50 are leaving easy wins on the table.
The aggregate numbers
- •Average score: 55/100
- •Median score: 54/100
- •Highest: Klaviyo at 98/100
- •Lowest: Amazon at 29/100
- •7.9% of sites scored above 70 (the “good” threshold)
- •7.5% scored below 40 (failing)
- •66% scored above 50
A 55/100 average means most major sites have done some of the work — they’re crawlable by AI bots, their copy has identifiable answer blocks — but they’ve missed the high-leverage structured signals. Most are leaving 30+ points of score on the table.
Where the internet actually fails
The pass rate per factor reveals which AEO levers are pulled and which are ignored. Sorted worst to best:
- •FAQ Schema — 2.4% pass. Only 6 of 250+ sites have valid FAQPage JSON-LD on the homepage.
- •llms.txt — 27.7% pass. Most sites haven’t adopted the convention.
- •Semantic HTML — 44.7% pass. Most sites are div soup with no <article>, <section>, or <header> tags.
- •AI Bot Access (robots.txt) — 84.6% pass. Most sites allow GPTBot, ClaudeBot, and PerplexityBot.
- •Quick Answer Block — 89.0% pass. Most homepages have an extractable definition.
- •HowTo Schema — 98.8% pass. Usually a default-pass for non-tutorial pages.
The FAQ Schema disaster
FAQPage structured data is the single biggest gap in our entire dataset. AI engines treat structured Q&A as directly extractable: when ChatGPT looks for "what is X?" it preferentially cites pages where the Q&A is wrapped in machine-readable JSON-LD. 97.6% of the sites we scanned don’t have it. Plenty of those sites have FAQ pages — they just haven’t wrapped the content in schema. This is the single highest-leverage AEO fix because it takes 10 minutes and almost nobody has done it.
The llms.txt gap
llms.txt is a newer convention (proposed by Jeremy Howard of Answer.AI) — a plain-text file at /llms.txt that tells AI models what your site covers, which pages are most important, and how to cite you. 27.7% of sites have one. Adoption is concentrated in dev-tools and AI infra companies (Klaviyo, Coursera, Neon, Browserbase, Trigger.dev, Cursor, Ramp, Framer, Modal, Pinecone) — the teams that follow AI tooling news closely. Adoption will go mainstream in the next 12 months. Sites that adopt early get cited disproportionately.
The semantic HTML problem
Less than half of the sites we scanned use proper semantic HTML on their homepage. Many are React or Next.js builds where the entire layout is nested div elements with no article, main, section, or header tags. AI crawlers can still extract text, but they have to guess at structure. Sites with clean semantic HTML score significantly higher across other factors too — because the same teams that care about HTML hygiene also implement schema and llms.txt.
Industry leaderboard
Average AEO score by industry, ranked best to worst:
- 1.SaaS B2B — 65/100 (Klaviyo, Zendesk, Stripe lead)
- 2.YC Top — 61/100
- 3.Developer tools — 61/100 (GitHub, Vercel, Cursor)
- 4.Education — 58/100 (Coursera, edX, Khan Academy)
- 5.AI companies — 58/100 (Anthropic, OpenAI, Perplexity all in the mid-50s to mid-60s)
- 6.Indie SaaS — 57/100
- 7.Healthcare — 55/100
- 8.Fintech — 55/100
- 9.Marketing blogs — 55/100 (the AEO experts themselves)
- 10.E-commerce — 51/100
- 11.SaaS B2C — 50/100
- 12.Travel & hospitality — 49/100
- 13.Fashion & lifestyle — 48/100
- 14.Media / news — 43/100
Two findings stand out. First: SaaS B2B leads, almost certainly because the technical teams building those sites overlap with the people reading SEO and AI Twitter. Second: media and news sit dead last. The publications most invested in SEO over the past 20 years — WSJ (30/100), Bloomberg (35), TechCrunch (35), Vox (37) — are the ones least prepared for AI search.
That second finding is counterintuitive but explainable. Media sites optimized for Google’s legacy ranking algorithm, which rewards a different signal mix than AI search. AI engines need clean structured data and crawler permissions. Most media sites are tangled in DRM-style layouts, paywall logic, and aggressive anti-bot rules — exactly the wrong configuration for AEO.
The standouts
Klaviyo: the only 98
Klaviyo is the single best-scoring site in our study. They pass all six factors: FAQ schema, HowTo schema, llms.txt, quick answer, robots.txt, semantic HTML — every one at 100 except llms.txt at 85. The lesson is mostly that they treated AEO as a technical hygiene project, not a content project. Their homepage doesn’t have unusually good copy. It just has unusually clean structure.
Amazon: 29
Amazon is the lowest-scoring site in our entire dataset. The world’s largest e-commerce site fails 4 of 6 AEO checks: no FAQ schema, no llms.txt, semantic HTML score 0 (zero article/section/header tags on the homepage), and robots.txt at 30 (blocks several AI crawlers). They pass quick answer (70) and HowTo (default pass), but those alone don’t move the needle. If Amazon ran the standard AEO fixes, their score would jump 30+ points.
Other notable low scorers: WSJ (30), Yelp (30), Reddit (34), Bloomberg (35), TechCrunch (35), Cash App (35), Hyatt (35), Vox (37), and Indie Hackers (37). Most of these have something in common — they’re heavily dynamic JavaScript apps where the rendered HTML lacks structure, and many haven’t updated their robots.txt for the AI bot wave.
What gets a high score
Looking at the top 10, three patterns emerge. The high scorers (1) ship clean semantic HTML, (2) include FAQPage schema, and (3) have an llms.txt at the root. None of those are content decisions — they’re technical decisions that take a single afternoon.
Here are the six things that move a score from ~30 to ~80:
- 1.Add FAQPage JSON-LD schema to any page with Q&A content. Most sites already have FAQs, just no schema.
- 2.Create an llms.txt at /llms.txt with a brief site description and links to your most important pages.
- 3.Replace nested div elements with proper semantic HTML: article, main, section, header, nav.
- 4.Allow GPTBot, ClaudeBot, PerplexityBot, and Google-Extended in your robots.txt.
- 5.Put a 40-60 word definition block immediately under the H1 on key pages.
- 6.Add HowTo schema for any tutorial-style content.
The first three alone typically add 30+ points to a score. We’ve seen sites go from 35 to 75 in an afternoon of work.
What this study doesn’t show
- •We scanned the homepage only. Many sites have stronger AEO on their actual content pages (blog posts, docs, product pages).
- •A high AEO score doesn’t guarantee citation by AI engines. Domain authority, content depth, and freshness all matter independently.
- •We didn’t measure actual citations in ChatGPT, Perplexity, or Google AI Overviews. That’s a separate study and we’re working on it.
- •Some sites (Nordstrom, Delta, hers.co) returned 401 or network errors and were excluded — possibly bot defenses or paywalls.
- •Schema validity matters: a site can have invalid FAQPage JSON-LD that we count as fail. We checked for valid markup, not just presence.
What you should do
If you’re reading this on a SaaS marketing or founder team: scan your own site. The chances are ~92% that you’ll score below 70. The fixes are technical and shippable in a few hours. AEO isn’t mature yet — that’s the opportunity. Whoever does the work first earns the citation pattern.
If you’re scoring under 40, you’re likely missing FAQ schema, llms.txt, AND semantic HTML — three fixes that almost always move you to 70+. You don’t need a 6-month redesign. You need an afternoon.
Check your AEO score for free
Enter your URL and see how your site scores across all 6 AEO factors. No signup required.