Skip to main content
My Blog
Insights & Stories

Explore my latest thoughts on design, development, branding, and the creative process.

Back to Blog
Strategy12 min read
How to Run a Content Audit on a Large Website Without Losing Your Mind

Most content audits stall before they finish. Here's a repeatable system for crawling, categorizing, and triaging every page without burning out.

E
Excelle Escalada
Digital Experience ArchitectSep 8, 2024

The audit that never ends

I've seen content audits die the same way every time. Someone exports a sitemap with 800 URLs, opens a spreadsheet, starts labeling rows, gets to page 47, and then the spreadsheet sits untouched for three months while the next fire takes priority.

The audit wasn't bad. The process was.

Content audits fail because teams try to evaluate every page one at a time, from scratch, with no scoring criteria and no clear decision rules. Without a framework, every page becomes a judgment call, and judgment calls are exhausting at scale.

The good news: a large-site content audit is a data problem, not a willpower problem. With the right structure, you can triage a 500-page site in a few focused sessions and come out with a prioritized action plan you can actually execute.

Start with the crawl, not the spreadsheet

The first instinct is to open a spreadsheet and start making decisions. Resist it. Before you can triage content, you need to know what you actually have, and that means a technical crawl.

Use a crawler like Screaming Frog, Sitebulb, or Ahrefs Site Audit. Set it to crawl your full domain and export:

  • All indexed URLs (HTML pages only, not assets)
  • Page title and H1 for each URL
  • HTTP status code (200, 301, 404, etc.)
  • Word count
  • Last modified date (if available in HTML or via CMS export)
  • Inbound internal link count
  • This raw export becomes your audit foundation. If your CMS can export a content list with last-edited dates and author data, merge that in too. The more context each row has before you start evaluating, the faster your decisions will be.

    One practical note: most sites have more URLs than editors expect. E-commerce product pages, tag archives, paginated URLs, filtered search results, and author pages all add up. Scope your audit to content that someone intentionally created: service pages, news articles, guides, and landing pages. Filter out system-generated URLs before you start triaging.

    Connect traffic and search data

    A URL without context is hard to make decisions about. Before you start categorizing, add two data columns from your analytics and SEO tools:

    Pageviews (last 12 months): Pull from GA4. This tells you whether real users are finding and reading the page. Sort descending to identify your highest-traffic content immediately.

    Organic clicks or impressions (last 12 months): Pull from Google Search Console. A page with low traffic but high impressions may have SEO potential worth preserving. A page with zero traffic and zero impressions for 12 months is a strong delete candidate.

    You don't need every possible metric. Traffic and search visibility are the two signals that matter most for content decisions. Everything else is supporting context.

    The four-bucket triage framework

    Once your spreadsheet has URL, title, word count, last modified, traffic, and search visibility, you have enough information to sort every page into one of four buckets.

    Keep: The page is accurate, well-trafficked, and serving its purpose. No immediate action required. Schedule a review date 12 months out.

    Update: The page has traffic or search value, but the content is outdated, thin, or underperforming relative to its potential. It needs a rewrite, new information, or structural improvements.

    Consolidate: You have multiple pages covering similar ground, splitting traffic and confusing search intent. Pick the strongest one, redirect the others to it, and merge the best content into one authoritative page.

    Delete: The page serves no audience, has no traffic, no search visibility, and no reason to exist. Archive or delete it and set a 301 redirect to the closest relevant page.

    Apply these buckets in order of effort: start by marking obvious deletes (zero traffic, last modified four-plus years ago, word count under 100), then flag consolidation candidates (pages with nearly identical titles or topics), then split your remaining inventory between keep and update.

    Aim to make each decision in under 30 seconds per page. If a page is taking longer, add it to a "review later" list and keep moving. Don't let ambiguous pages stall the whole audit.

    The spreadsheet scoring model

    For pages that aren't obviously keep or delete, a simple scoring model prevents decision fatigue. Score each page on five criteria, each from 1 to 3:

    Criterion123
    Traffic (last 12 months)Under 100 views100-1,000Over 1,000
    Search visibilityNo impressionsSome impressionsRanking in top 20
    Content accuracyOutdated or wrongSome outdated infoFully accurate
    Content depthUnder 300 words, thinAdequateComprehensive
    Strategic alignmentOff-brand or irrelevantPartially alignedCore content

    Total scores:

  • 5-7: Strong delete or deprioritize candidate
  • 8-10: Update or consolidate
  • 11-15: Keep, with light review
  • This model takes under a minute per page and gives you a defensible rationale when someone asks why you archived 120 pages. "They scored under 7 on five weighted criteria" is a much easier conversation than "they felt old."

    How to prioritize execution

    Once every page is bucketed and scored, you have a decision — but not yet a work plan. Prioritization is the step most audits skip, which is why backlogs stay backlogs.

    Start with deletes

    Deletes are fast and create immediate value. A 404-free, low-page-count site crawls faster, has cleaner link equity distribution, and is easier for editors to maintain. Before you write a single word of new content, process your delete list: archive pages, set redirects, and remove them from the sitemap.

    Consolidate second

    Consolidations take more effort (someone has to actually merge content and choose which page survives) but the SEO payoff is immediate. Consolidating three thin 400-word pages into one 1,200-word authoritative page almost always performs better in search than the three originals combined.

    Triage updates by impact

    For the update bucket, sort by traffic combined with strategic alignment. A high-traffic page that's partially outdated is worth prioritizing over a strategically aligned page that nobody reads yet. Work top-down.

    Dedicate recurring time, not project time

    The hardest part of acting on an audit is treating it as a project with a start and end date. It's not. Assign a recurring block — say, two hours every other week — to work through the update queue. Pages in the "keep" bucket get reviewed on a rolling 12-month cycle. This keeps the audit alive instead of letting it expire as soon as the initial push is over.

    A repeatable audit cadence

    One audit is useful. A recurring audit process is strategic.

    For most organizations, this rhythm works:

    Quarterly: Review pages published in the last 90 days. Are they performing as expected? Do any need early intervention?

    Annually: Full triage of the site inventory. Add traffic and search data, re-score borderline pages, update the delete and consolidate queues.

    By trigger: Any time a significant site change happens (CMS migration, rebrand, service restructure), run a targeted audit of the affected sections before and after.

    Document the cadence in your governance policy so it survives staff turnover. The institutional memory of "why we deleted those 80 pages in 2024" needs to live somewhere other than one person's email archive.

    Content audit checklist

    Use this before you start each audit cycle.

  • Export full crawl (URLs, titles, H1s, status codes, word count, last modified, inbound links)
  • Merge CMS content export for author and publish-date data
  • Add GA4 pageviews (last 12 months)
  • Add Google Search Console organic clicks and impressions (last 12 months)
  • Filter out system-generated URLs (pagination, archive pages, tags)
  • Apply four-bucket triage (keep / update / consolidate / delete)
  • Score ambiguous pages using the five-criteria model
  • Process delete queue: archive, redirect, remove from sitemap
  • Build consolidation plan: identify destination pages, map redirects
  • Sort update queue by traffic and strategic alignment
  • Assign recurring time blocks for execution
  • Document decisions and audit date in governance log

  • A content audit is the foundation of every content strategy. Get in touch if you'd like help building an audit process your team will actually maintain.

    Share this article

    More Articles