129 Pages, Zero Structured Data

By Scott Finney • May 12, 2026

Documents flowing through an automated processing pipeline with stamps and tags

Image: Generated with Grok

A national outdoor content publisher had 129 editorial pages — hunting guides, gear roundups, recipes, how-to articles — generating steady organic traffic. The content was good. The problem was invisible: none of it had structured data.

Without schema markup, those pages couldn't appear in Google's recipe carousels, FAQ accordions, list snippets, or featured placements. AI answer engines like Perplexity and ChatGPT Search couldn't pull from them either. The content existed, but search features beyond the standard blue link couldn't see it.

The team knew this. They just couldn't do anything about it. Writing correct JSON-LD schema for 129 pages means understanding five different schema types, extracting the right content from each page, cross-referencing authors, building breadcrumb trails, linking entities to knowledge bases, and validating every block before deployment. A senior technical SEO could handle 8–10 pages per day at quality. That's three weeks of dedicated work — time a lean content team didn't have.

So the pages sat without markup, month after month.

What We Built

An automated pipeline that reads a standard site crawl export and generates production-ready schema for every URL. It works in three passes, each adding depth:

Pass 1 — Mechanical extraction. Page title becomes the headline, meta description becomes the description, URL structure generates breadcrumbs. Every page gets labeled as a generic Article. No AI involved. This is what a dev team would build if handed the task — functional but blunt.

Pass 2 — Content analysis. AI reads the full body text of each page and makes editorial decisions: Is this actually a recipe? A gear roundup? A Q&A interview? It detects the correct schema type, rewrites vague headlines into content-specific ones, writes descriptions from the actual article body (not the often-weak meta description), and extracts type-specific data — ingredients and steps for recipes, ranked items for gear guides, question-answer pairs for interviews.

Pass 3 — Knowledge graph assembly. A second AI pass builds a complete @graph container for each page — the format Google, Bing, and AI answer engines prefer for understanding how entities relate. Each page gets interconnected nodes: Organization, WebSite, WebPage, the content node, BreadcrumbList, and Person. Entities mentioned in the text — wildlife species, geographic locations, regulations — get typed and linked to their Wikipedia entries.

Content-Aware, Not Template-Driven

Here's the thing: every schema block is generated from the actual page content, not from a template with fields swapped in.

A recipe page about smoked duck gets recipeIngredient arrays and recipeInstruction steps extracted from the article. A gear guide gets an ItemList with individually described items. A Q&A interview gets FAQPage with real question-answer pairs pulled from the text.

This matters because search engines evaluate schema quality. Template schema with generic descriptions gets ignored. Content-matched schema with specific, accurate structured data earns rich results.

By the Numbers

Pages with structured data: 0 → 129

Schema types detected: 5 (Article, ItemList, FAQPage, Recipe, HowTo)

Pages with correct non-generic type: 52 (40%)

Headlines rewritten from vague to specific: 36 (28%)

Descriptions written from actual body content: 129 (100%)

Entity links to Wikipedia: Across all 129 pages

Validation pass rate: 129/129 (100%)

API errors: 0

Processing time: ~1 hour, run in batches (vs. ~3 weeks estimated manual)

API cost: Under $2

Built for Content Writers, Not Developers

The output wasn't a developer handoff. It was a workbook designed for a content writer to use directly:

Page Overview — plain-English descriptions of what each page's schema enables ("Google can show a recipe card with cook time, ingredients, and star ratings"), with before/after comparisons highlighted where the AI improved over the baseline.
Action Items — a prioritized to-do list in plain language ("Add a publish or updated date to this page"), sorted by impact: Fix First, Important, Nice to Have.
Schema Code — the actual <script> blocks, ready to copy and paste into a CMS.
How This Works — a plain-language explainer for stakeholders who need to understand what schema is and why it matters, without jargon.

What Changed for the Business

The content team went from knowing they needed schema to having production-ready code for every page, with a clear action list for the gaps they could close on the content side.

129 pages are now eligible for rich results they previously couldn't appear in. Recipe pages can surface in Google's recipe carousel. Gear guides can appear as list snippets. FAQ content can expand directly in search results. And the full @graph structure positions every page for AI answer engines that increasingly pull from well-structured sources.

The gap analysis also gave the team a concrete roadmap: add publish dates to the 45% of pages missing them, add author profile links to the 50% without them, and add image dimension meta tags site-wide. Each item prioritized and written in language a content editor can act on.

How It Was Built

I drove the strategic decisions: the three-pass architecture, the choice to generate from full page content rather than metadata alone, the output format designed for content writers, and the entity-linking approach. I also defined the text-cleaning rules and the prompt engineering that controls what the AI extracts from each page.

The AI handled two roles. First, as the schema generation engine — reading 129 pages of body content, detecting the correct type for each, extracting structured data, writing content-specific headlines and descriptions, and assembling valid @graph containers. That's the work that would have taken three weeks. The AI completed 258 API calls with zero errors in about an hour, processing URLs in batches.

Second, as the implementation partner — writing the 14-module pipeline itself. I directed the architecture, reviewed output quality, and course-corrected when the initial format was too technical for the intended audience. The AI rebuilt the output layer from a six-sheet technical workbook into a four-sheet content-writer-friendly deliverable in a single session.

Total project — from empty directory to 129 validated schema blocks with a polished deliverable — was completed in a single working session.

Is Your Content Invisible to Search Features?

If your pages don't have structured data, they're competing for the standard blue link and nothing else. Every rich result slot — recipe cards, FAQ expansions, list snippets, AI answer citations — goes to someone who did the work.

The gap between "we should do this" and "it's done" used to be weeks of specialist time. It doesn't have to be.

More from Updates

$270K in Inventory, Zero Daily Visibility → 12 Pages in One Afternoon → AI Agents Wrote Nonprofit Grant Applications →

Your Content Shouldn't Be Invisible

Free AI assessment. Find out where your operation has data sitting on the table.

Get Free Assessment →

See how task automation and smart workflows work →