The audit that never ends
I've seen content audits die the same way every time. Someone exports a sitemap with 800 URLs, opens a spreadsheet, starts labeling rows, gets to page 47, and then the spreadsheet sits untouched for three months while the next fire takes priority.
The audit wasn't bad. The process was.
Content audits fail because teams try to evaluate every page one at a time, from scratch, with no scoring criteria and no clear decision rules. Without a framework, every page becomes a judgment call, and judgment calls are exhausting at scale.
The good news: a large-site content audit is a data problem, not a willpower problem. With the right structure, you can triage a 500-page site in a few focused sessions and come out with a prioritized action plan you can actually execute.
Start with the crawl, not the spreadsheet
The first instinct is to open a spreadsheet and start making decisions. Resist it. Before you can triage content, you need to know what you actually have, and that means a technical crawl.
Use a crawler like Screaming Frog, Sitebulb, or Ahrefs Site Audit. Set it to crawl your full domain and export:
This raw export becomes your audit foundation. If your CMS can export a content list with last-edited dates and author data, merge that in too. The more context each row has before you start evaluating, the faster your decisions will be.
One practical note: most sites have more URLs than editors expect. E-commerce product pages, tag archives, paginated URLs, filtered search results, and author pages all add up. Scope your audit to content that someone intentionally created: service pages, news articles, guides, and landing pages. Filter out system-generated URLs before you start triaging.
Connect traffic and search data
A URL without context is hard to make decisions about. Before you start categorizing, add two data columns from your analytics and SEO tools:
Pageviews (last 12 months): Pull from GA4. This tells you whether real users are finding and reading the page. Sort descending to identify your highest-traffic content immediately.
Organic clicks or impressions (last 12 months): Pull from Google Search Console. A page with low traffic but high impressions may have SEO potential worth preserving. A page with zero traffic and zero impressions for 12 months is a strong delete candidate.
You don't need every possible metric. Traffic and search visibility are the two signals that matter most for content decisions. Everything else is supporting context.
The four-bucket triage framework
Once your spreadsheet has URL, title, word count, last modified, traffic, and search visibility, you have enough information to sort every page into one of four buckets.
Keep: The page is accurate, well-trafficked, and serving its purpose. No immediate action required. Schedule a review date 12 months out.
Update: The page has traffic or search value, but the content is outdated, thin, or underperforming relative to its potential. It needs a rewrite, new information, or structural improvements.
Consolidate: You have multiple pages covering similar ground, splitting traffic and confusing search intent. Pick the strongest one, redirect the others to it, and merge the best content into one authoritative page.
Delete: The page serves no audience, has no traffic, no search visibility, and no reason to exist. Archive or delete it and set a 301 redirect to the closest relevant page.
Apply these buckets in order of effort: start by marking obvious deletes (zero traffic, last modified four-plus years ago, word count under 100), then flag consolidation candidates (pages with nearly identical titles or topics), then split your remaining inventory between keep and update.
Aim to make each decision in under 30 seconds per page. If a page is taking longer, add it to a "review later" list and keep moving. Don't let ambiguous pages stall the whole audit.
The spreadsheet scoring model
For pages that aren't obviously keep or delete, a simple scoring model prevents decision fatigue. Score each page on five criteria, each from 1 to 3:
| Criterion | 1 | 2 | 3 |
|---|---|---|---|
| Traffic (last 12 months) | Under 100 views | 100-1,000 | Over 1,000 |
| Search visibility | No impressions | Some impressions | Ranking in top 20 |
| Content accuracy | Outdated or wrong | Some outdated info | Fully accurate |
| Content depth | Under 300 words, thin | Adequate | Comprehensive |
| Strategic alignment | Off-brand or irrelevant | Partially aligned | Core content |
Total scores:
This model takes under a minute per page and gives you a defensible rationale when someone asks why you archived 120 pages. "They scored under 7 on five weighted criteria" is a much easier conversation than "they felt old."
How to prioritize execution
Once every page is bucketed and scored, you have a decision — but not yet a work plan. Prioritization is the step most audits skip, which is why backlogs stay backlogs.
Start with deletes
Deletes are fast and create immediate value. A 404-free, low-page-count site crawls faster, has cleaner link equity distribution, and is easier for editors to maintain. Before you write a single word of new content, process your delete list: archive pages, set redirects, and remove them from the sitemap.
Consolidate second
Consolidations take more effort (someone has to actually merge content and choose which page survives) but the SEO payoff is immediate. Consolidating three thin 400-word pages into one 1,200-word authoritative page almost always performs better in search than the three originals combined.
Triage updates by impact
For the update bucket, sort by traffic combined with strategic alignment. A high-traffic page that's partially outdated is worth prioritizing over a strategically aligned page that nobody reads yet. Work top-down.
Dedicate recurring time, not project time
The hardest part of acting on an audit is treating it as a project with a start and end date. It's not. Assign a recurring block — say, two hours every other week — to work through the update queue. Pages in the "keep" bucket get reviewed on a rolling 12-month cycle. This keeps the audit alive instead of letting it expire as soon as the initial push is over.
A repeatable audit cadence
One audit is useful. A recurring audit process is strategic.
For most organizations, this rhythm works:
Quarterly: Review pages published in the last 90 days. Are they performing as expected? Do any need early intervention?
Annually: Full triage of the site inventory. Add traffic and search data, re-score borderline pages, update the delete and consolidate queues.
By trigger: Any time a significant site change happens (CMS migration, rebrand, service restructure), run a targeted audit of the affected sections before and after.
Document the cadence in your governance policy so it survives staff turnover. The institutional memory of "why we deleted those 80 pages in 2024" needs to live somewhere other than one person's email archive.
Content audit checklist
Use this before you start each audit cycle.
A content audit is the foundation of every content strategy. Get in touch if you'd like help building an audit process your team will actually maintain.