Key Takeaways
- Schema.org, in collaboration with Google, launched a public dataset on June 4, 2026, showing aggregate usage statistics for every Schema.org Type and Property across the public web.
- Data is updated monthly and presented in popularity buckets (such as “1M to 10M domains”) rather than exact counts, to protect privacy and filter daily noise.
- The May 2026 dataset contains 5,545 entries across 958 Types and 4,587 Predicates, but just 12 Types, only 1.3 percent of the total, have reached the highest adoption tier of over 10 million domains.
- High adoption does not automatically mean high SEO value; some widely used types exist mainly as crawl-scale infrastructure rather than rich-result triggers.
- A separate independent audit of 5,000 production sites found that a large share of deployed schema markup is technically broken, even on sites that “adopted” structured data years ago.
For as long as schema markup has existed, choosing which types to implement has mostly come down to educated guesswork, blog posts repeating the same five “essential” schema types, and the occasional Google documentation page hinting at what triggers rich results. That changed on June 4, 2026, when Schema.org and Google jointly published the first public dataset showing exactly how many domains actually use each Type and Property across the web.
What the Dataset Actually Contains
The dataset comes straight from Google’s own crawl of the public web and gets updated monthly. Rather than exact counts, which could create privacy concerns or get distorted by daily noise, the data is presented in aggregated popularity buckets, things like “1 million to 10 million domains” or “under 1,000 domains.” Schema.org described this approach as one that filters daily noise while still highlighting meaningful adoption trends for researchers and toolmakers.
The most recent full dataset, covering May 2026, contains 5,545 entries spanning 958 Types and 4,587 Predicates (properties). That sounds like a rich, thriving vocabulary on the surface. The distribution tells a sharper story.
The Numbers: A Small Core, a Long Tail
Just 12 Itemtypes have reached the top adoption tier, found on more than 10 million distinct domains. These are the absolute bedrock of the modern web’s structured data, and they’re exactly the types you’d expect: Organization, Person, WebPage, BreadcrumbList, and ImageObject among them. That’s 1.3 percent of all Types in the entire specification carrying the overwhelming majority of real-world adoption weight.
One tier down, another 35 Itemtypes sit in the 1 million to 10 million domain range. This is where things get genuinely interesting for SEO purposes, because it includes Product, Review, Article, FAQPage, and LocalBusiness, the types Google actively uses to power visible rich results. FAQPage’s presence at this level is a notable signal in itself, underscoring how directly Google’s rich result features drive real adoption, even for a schema type with a fairly narrow use case.
Below that, things thin out fast. The vast majority of the 958 Types sit in lower buckets, including a substantial number with fewer than 1,000 domains using them. Schema.org’s own FAQ on the dataset addresses this directly: a low bucket doesn’t necessarily mean a type is useless. It often just reflects a small total addressable audience, like specialized medical or government schemas, where even strong adoption within that niche will never produce a large absolute domain count.
Important Caveats Before You Act on This Data
A few things are worth understanding clearly before treating bucket position as a straightforward recommendation.
The data only reflects what Google’s crawler sees. Schema.org’s own documentation acknowledges that no single crawl captures the entire internet simultaneously, and the dataset inherits whatever biases come from Google’s crawl scope and indexing methodology. Pages blocked by robots.txt, for instance, contribute zero data here, regardless of what schema markup they might actually contain.
There’s also a related infrastructure wrinkle worth knowing. Google reduced Googlebot’s maximum crawl file size to 2MB in February 2026, an 86.7 percent reduction from the previous 15MB limit. Pages with large HTML files where schema markup sits deep in the document might not get fully processed under that smaller limit, which could mean some real-world structured data simply isn’t being counted, independent of how widely it’s actually deployed.
Adoption volume and SEO value are not the same thing. Schema.org’s own documentation states this plainly: adoption volume alone does not determine SEO value, but it is a useful signal. A type with massive adoption, like BreadcrumbList, functions more as crawl-scale infrastructure than a rich-result trigger most users actually see. Meanwhile, a relatively rare specialized type might still be the single most important piece of markup for a niche where it applies.
FAQPage specifically has a complicating wrinkle right now. Separate from the usage statistics dataset, Search Console’s FAQ rich result filter and Rich Results Test support for FAQPage are scheduled for removal in June 2026. The schema type itself remains valid and the markup doesn’t need to be removed, but it stops functioning as a Google rich result lever going forward. It may still matter for AI systems parsing your content, just not for the traditional rich snippet it used to produce.
Why So Much Deployed Schema Is Still Broken
Adoption data answers “is anyone using this,” not “is it working correctly,” and a separate independent audit released in April 2026 makes that gap painfully clear. Researchers pulled a stratified sample of 5,000 production sites across eight CMS platforms and ran every URL through a Rich Results Test plus a cross-check against the reference vocabulary.
The findings cut against the usual “adoption is rising” narrative. Custom or raw-HTML sites had the lowest overall adoption at 19 percent but the highest per-instance validity, suggesting that when a hand-rolled team ships schema, they tend to test it properly. Shopify, by contrast, ships Product schema by theme default on 89 percent of stores, but only 31 percent of those also pair it with Organization schema, leaving an incomplete entity picture even on sites that technically “have” structured data.
The broader pattern the audit surfaced: most production schema deployments are silently broken in some way, frequently emitting JSON-LD that fails Google’s own validator on at least one required field. Structured data has historically been the kind of technical SEO task an engineering team ships once in a sprint and then forgets about, and that pattern hasn’t really changed even as awareness of schema’s importance has grown.
How to Actually Use This Data
Treat the usage statistics dataset as a starting point for prioritization, not a final verdict. If you’re deciding what to implement next on a site, the 1 million to 10 million domain tier, Product, Review, Article, FAQPage, LocalBusiness, is a reasonable, evidence-backed shortlist of types Google’s own systems are clearly built to recognize and reward.
But pair that decision with an actual validity check on what you already have, not just what you’ve technically added. Given how common silently broken implementations are, even on platforms with high nominal adoption, running your existing schema through Google’s Rich Results Test is arguably more valuable right now than adding a new schema type you don’t yet have.
For niche or vertical-specific schema, don’t dismiss a type just because it sits in a lower popularity bucket. A specialized type used by a focused community, academic publishers, broadcasters, government entities, can still represent the most important piece of markup for that specific niche, even with a comparatively small total domain count.
Common Mistakes to Avoid
- Equating high adoption with high SEO impact. Some of the most widely used types are infrastructure-level markup, not rich-result triggers most users will ever see.
- Continuing to optimize FAQPage purely for Google rich results. Search Console’s FAQ rich result support is being removed in June 2026; the schema can stay for AI parsing purposes but shouldn’t be your rich-result strategy anymore.
- Assuming a low adoption bucket means a schema type isn’t worth using. Niche types often sit in lower buckets simply because their addressable audience is small, not because they lack value within that audience.
- Adding new schema types without validating existing markup first. Independent audits show a large share of deployed schema is technically broken; fixing what exists often matters more than adding more.
- Ignoring the crawl file size change when auditing why your schema “isn’t showing up.” If your markup sits deep in a large HTML document, Googlebot’s reduced 2MB crawl limit could be a factor.
Expert Tips
Compare monthly dataset releases over time rather than looking at a single month in isolation. Schema.org structured the data specifically so a type moving from one popularity bucket into a higher one signals meaningful adoption growth, which is a much stronger prioritization signal than a single snapshot.
If you manage schema across a CMS-based site, audit whether your platform’s default schema implementation is actually complete, the Shopify example of high Product adoption but low Organization pairing is a useful reminder that “the platform handles it” doesn’t always mean it handles it fully.
What’s Next
Schema.org has explicitly invited other search engines and web-scale crawlers to publish their own adoption statistics in the same open data format, with the stated goal of eventually building a multi-provider view rather than relying solely on Google’s crawl perspective. Whether Microsoft, or independent crawlers like Common Crawl, take up that invitation will matter a lot for how complete and unbiased this picture eventually becomes.
It’s also worth watching how this dataset interacts with the broader EU Digital Markets Act proceedings against Google, where the European Commission’s preliminary findings in April 2026 pushed toward requiring Google to share search data with competing search engines on fair terms. A more open, multi-provider structured data adoption picture would fit that broader regulatory direction.
Conclusion
Schema.org and Google’s new usage statistics dataset gives the SEO industry something that didn’t exist before: real, crawl-scale evidence of which structured data types are actually adopted across the web, updated monthly and broken into clear popularity tiers. Just 12 Types have reached the highest tier, while the vast majority of the 958 tracked Types sit in low-adoption buckets that often reflect niche relevance rather than low importance. The data is a genuinely useful prioritization tool, but it answers a different question than “is my schema actually working,” which is why pairing it with a real validation check on your existing markup matters just as much as deciding what to add next.
FAQ
What is the new Schema.org usage statistics dataset?
It’s a public dataset, launched June 4, 2026, jointly by Schema.org and Google, showing how many domains use each Schema.org Type and Property, based on Google’s own web crawl.
How often is the dataset updated?
Monthly. Schema.org said monthly updates are sufficient because web adoption trends generally change slowly, and each release goes through manual validation before publication.
How many schema types have reached the highest adoption tier?
Just 12 Itemtypes, found on more than 10 million domains, representing about 1.3 percent of all Types in the specification.
Which schema types are in the second-highest adoption tier?
Around 35 types fall into the 1 million to 10 million domain range, including Product, Review, Article, FAQPage, and LocalBusiness.
Does high adoption mean a schema type is more valuable for SEO?
Not necessarily. Schema.org’s own documentation states that adoption volume alone does not determine SEO value, though it is a useful signal.
Is FAQPage schema still worth using?
The schema type remains valid, but Search Console’s FAQ rich result filter and Rich Results Test support are being removed in June 2026. It may still help AI systems parse your content, just not as a traditional Google rich result anymore.
Why might my schema markup not be showing up in the dataset or in rich results?
A few possibilities include the markup being broken or missing required fields, the page being blocked by robots.txt, or the schema sitting deep in a large HTML file that exceeds Googlebot’s reduced 2MB crawl limit introduced in February 2026.
How accurate is structured data deployment across the web really?
An independent audit of 5,000 production sites in April 2026 found that a large share of deployed schema fails Google’s own validator on at least one required field, despite nominal adoption looking healthy.
Will other search engines publish their own usage statistics?
Schema.org has invited other search engines and web-scale crawlers to publish data in the same open format, but as of now the dataset reflects Google’s crawl specifically.
Where can I access the raw dataset?
It’s available in CSV and JSON formats on the official Schema.org GitHub repository.

