241 Pages Audited in an Afternoon

By Scott Finney • May 25, 2026

Top-down view of a laptop showing a spreadsheet audit with green and red status indicators

Image: Generated with Grok

A client had a content hub with 241 pages built up over years of publishing. Nearly 1 million search impressions annually. But their hub page was ranking at position 65.

That's a weird spot. It means Google knows you're an authority on the topic but can't figure out how to organize what you've got. The content is there. The structure isn't.

Their digital team was planning a full hub redesign with structured subcategory navigation, but before they could build anything, they needed to know what they were working with. Every single URL needed a recommendation: keep it, optimize it, consolidate it, expand it, or evaluate it for removal. And each page needed to be mapped to one of six proposed subcategories to inform the new information architecture.

Doing that manually? 20 to 30 hours of an experienced analyst reading every page, cross-referencing search data, identifying overlap, and making judgment calls.

So we built a pipeline instead.

What We Built

A custom Python pipeline that combined three data layers with AI analysis to produce a complete audit in a single automated run.

A targeted Screaming Frog crawl of the content folder captured on-page content — word count, headings, body text, near-duplicate detection. Data collection took about two hours across 241 URLs. Then we paired it with 12 months of Google Search Console data (impressions, clicks, CTR, average position) and 12 months of GA4 data (sessions, page views, engagement rate) per URL.

Three data sources. One merged view per page.

Two-Pass Analysis

Pass 1 was rule-based. No AI involved. Pages ranking in the top 10 with over 1,000 impressions? Keep. Pages under 300 words with under 500 impressions and under 50 sessions? Flag for removal evaluation. These are binary decisions — no judgment needed. That handled 77 of 241 URLs with zero ambiguity.

Pass 2 sent the remaining 164 pages individually to Claude's API with their full content, heading structure, and performance data. Each page came back with a flag recommendation, subcategory assignment, search intent classification, a rationale explaining why, and specific action notes.

A fuzzy title matching algorithm also identified consolidation candidates — pages with titles similar enough to suggest they were splitting traffic on the same search intent.

Quality Controls

Deterministic model settings (temperature 0) for reproducible results
All AI output passed through text sanitization for clean deliverables
Rule-based flags took precedence over AI recommendations for threshold-based decisions
Near-duplicate detection from the crawl cross-validated the AI's consolidation recommendations

What We Found

Nearly half the content needed optimization, not removal. 109 pages had the right topics but misaligned titles, thin meta descriptions, or weak internal linking. Quick wins.

One article series was the clearest consolidation target — eight pages with nearly identical titles splitting traffic on the same search intent. Easy fix, immediate impact.

Content distribution was uneven. Two subcategories had 60-70 pages each while others had 15-23 — underrepresented relative to actual search demand.

And one page was carrying the entire folder. A single listicle generated over 12,000 sessions — more than many of the bottom 100 pages combined.

What the Client Got

A single CSV spreadsheet. Every URL had a clear recommendation, a rationale explaining why, and specific next steps. No interpretation needed. The subcategory mapping gave them a validated information architecture before a single wireframe was drawn.

Their digital team and design partner could start working immediately.

The Numbers

Pages audited: 241

Data sources integrated: 3 (crawl, GSC, GA4)

Data collection: ~2 hours

Pipeline build time: ~2 hours

Full audit execution: ~45 minutes

API cost: Under $2.00

Manual equivalent: 20-30 hours

Total wall time from start to finished spreadsheet: about half a day. And most of that was the crawl running in the background and the pipeline doing its thing — not a human staring at a screen.

Here's the thing — this isn't about replacing the analyst. The analyst still needs to review the recommendations, make final calls, and plan the implementation. But instead of spending 25 hours reading pages and filling in spreadsheets, they spend a couple hours reviewing a completed audit and making decisions.

That's the shift. AI handles the reading, cross-referencing, and initial classification. Humans handle the judgment, strategy, and execution.

Why This Pattern Matters

This was a content audit, but the pattern applies everywhere. Any time you've got a large set of items that need to be individually evaluated against multiple data sources and classified — that's a pipeline.

Product catalogs. Internal documentation. Support ticket backlogs. Compliance reviews. The shape of the problem is the same: too many items, too many variables, not enough hours.

The tools exist to do this right now. Not next year. Not when the technology matures. Right now, today, for under two dollars.

What would you do with 25 hours back?

More from Updates

129 Pages, Zero Structured Data → $270K in Inventory, Zero Daily Visibility → 12 Pages in One Afternoon →

Have a Content Problem Like This?

Free AI assessment. Find out where automation can save you the most time.

Get Free Assessment →

See how task automation and smart workflows work →