Get your CARS & BS in order before you migrate. πŸ˜‚

This simple mnemonic covers the six essential data sources for building a complete URL list:

  • Crawl
  • Analytics
  • Redirects (existing)
  • Sitemap
  • Backlinks
  • Search Console

Gather from all six, and you won’t miss a URL that matters.

URL Gathering Task Purpose
Crawl domain for HTML URLs Discover all live pages
Categorize by status code Identify redirect needs
Gather backlinked URLs Preserve SEO equity
Crawl XML sitemap Capture declared important pages
Export Search Console data Find indexed URLs
Gather analytics URLs Identify traffic-generating pages
Audit existing redirect tables Prevent redirect chains
Unify all datasets Create comprehensive redirect list

What Datasets Should I Use to Compile Comprehensive URL Lists?

Best Practice

A successful site migration requires gathering URLs from multiple sources to ensure no important page is missed. Relying on a single source will leave gaps in your redirect coverage.

Essential Data Sources

Source What It Captures
Domain Crawl All discoverable HTML URLs
XML Sitemap URLs you’ve declared important
Google Search Console URLs Google knows about
Analytics URLs with actual traffic
Backlink Tools URLs with external links
Existing Redirect Tables Current redirect mappings

The Multi-Source Approach

Each source captures URLs the others might miss:

  • Crawlers miss orphan pages not linked internally
  • Sitemaps may be outdated or incomplete
  • Search Console only shows indexed URLs
  • Analytics misses pages with zero traffic
  • Backlink tools focus on externally linked pages
Gather from all available sources, then deduplicate. It's far better to have redundant data than to miss a high-value URL that loses traffic or SEO equity after migration.

How Do I Crawl the Domain to Gather HTML URLs?

Critical First Step

Start by crawling your entire domain using a tool like Screaming Frog, Sitebulb, or similar web crawlers. This discovers all HTML pages that are linked within your site structure.

Crawl Configuration

Recommended settings:

  • Crawl depth: Unlimited (or high enough to reach all pages)
  • Respect robots.txt: Disable for migration purposes (you need ALL URLs)
  • Follow internal links: Enabled
  • Crawl outside start folder: Disabled (stay on your domain)
  • Store HTML: Optional but useful for content comparison

What to Extract

Export the following from your crawl:

URL Address
Status Code
Indexability
Canonical URL
Meta Robots
Title

Tips for Handling Large Sites

For sites with 100,000+ URLs:

  1. Segment by subdirectory: Crawl /blog/, /products/, /pages/ separately
  2. Use list mode: Feed known URLs directly instead of discovering
  3. Increase memory allocation: Screaming Frog may need 8GB+ RAM
  4. Run overnight: Large crawls can take hours
⚠️ Crawl the Live Site
Always crawl your current production site before migration begins. Crawling a staging or development environment will miss URLs that only exist in production.
Run your crawl at least twice: once at the start of migration planning and once immediately before launch. URLs change during development, and you need the most current data.

How Should I Categorize URLs by Status Code?

Essential Organization

After crawling, categorize all discovered URLs by their HTTP status code. Each category requires different handling in your redirect strategy.

Status Code Categories

200 OK URLs: Your primary redirect source list

Subcategory Description Action
Indexable Can appear in search results High priority redirects
Non-Indexable Blocked from indexing Evaluate redirect need
Canonicalized Points to another URL Redirect to canonical target
NoIndex Meta noindex tag present Lower priority redirects
UTM Parameters Marketing tracking URLs Usually exclude from redirects
Filter Parameters Faceted navigation URLs Usually exclude from redirects

301/302 Redirect URLs: Already redirecting

  • Document existing redirect destinations
  • Ensure new redirects point to final destinations
  • Avoid creating redirect chains

404 Not Found URLs: Broken but potentially important

  • Check for backlinks pointing to these URLs
  • Review Search Console for indexed 404s
  • May need redirects if they have SEO value
Create separate spreadsheet tabs or files for each status code category. This makes it easier to apply different redirect strategies to each group.

Should I Include URLs with Status Codes Other Than 200?

Yes: Critical for Complete Coverage

Many migration projects focus only on 200 status pages, but 301/302 and 404 URLs are equally important for maintaining SEO equity and user experience.

Why 301/302 URLs Matter

Existing redirects represent URLs that once had value:

  • External sites may still link to the old URLs
  • Search engines may have the old URLs indexed
  • Users may have bookmarked the old URLs

If you ignore existing redirects:

Old URL β†’ Current Redirect β†’ New Site (broken)

With proper handling:

Old URL β†’ New Site (direct)

Why 404 URLs Matter

A 404 status doesn’t mean a URL is worthless:

404 Scenario Redirect Need
Has backlinks from external sites Yes: preserve link equity
Appears in Search Console Yes: Google knows about it
Shows traffic in analytics Yes: users are looking for it
Recently deleted content Maybe: evaluate relevance
Never had traffic or links No: safe to ignore

Gathering 404 Data

Export 404s from:

  • Screaming Frog crawl results
  • Google Search Console coverage report
  • Server access logs
  • Analytics (pages with zero pageviews but sessions)
⚠️ Don't Redirect Everything
Not every 404 needs a redirect. Focus on 404s that have backlinks, search impressions, or represent content that moved rather than content that was intentionally removed.
Cross-reference your 404 list with Ahrefs or Search Console data. Prioritize redirects for 404 URLs that have external backlinks or recent search impressions.

What URL Variations Should I Account For?

Common Migration Pitfall

The same page can be accessed via multiple URL variations. Missing any variation means broken links and lost traffic.

Critical URL Variations

Variation Type Example A Example B
www vs non-www www.example.com/page example.com/page
Trailing slash /products/ /products
Capitalization /Products/Widget /products/widget
URL encoding /search?q=hello%20world /search?q=hello world
Protocol https:// http://
Index files /folder/index.html /folder/

How Variations Cause Problems

External links and bookmarks may use any variation:

Backlink uses: example.com/Blog/Post-Title
Your redirect: www.example.com/blog/post-title

Result: 404 error, redirect not matched

Gathering All Variations

  1. Check backlink reports: External sites use inconsistent formats
  2. Review server logs: See actual requested URLs
  3. Test manually: Try common variations of important pages
  4. Search Console: Shows URL variations Google has encountered

Standardization Strategy

Decide on your canonical format, then redirect all variations:

Old Path Redirect To
/Products/ /products
/PRODUCTS/ /products
/products /products
/Products /products
Use case-insensitive matching if your platform supports it. Otherwise, generate redirects for all known case variations of high-traffic URLs.
Preserve SEO Equity

URLs with external backlinks carry SEO value that transfers through 301 redirects. Backlink analysis tools reveal which URLs have this equity.

Tool Key Feature
Ahrefs Site Explorer β†’ Best by Links
Semrush Backlink Analytics β†’ Indexed Pages

Export Process (General Steps)

  1. Enter your domain in the tool’s site analysis feature
  2. Navigate to the pages or URLs report (shows which pages receive backlinks)
  3. Export the full list of pages with backlinks
  4. Filter to your domain’s URLs only

Key Data Points to Capture

Data Point Purpose
Target URL The URL receiving backlinks
Referring Domains Number of unique sites linking
Total Backlinks Overall link count
Link Quality Score Authority indicator (varies by tool)

Prioritization Framework

Not all backlinked URLs are equal:

Referring Domains Priority Action
50+ Critical Must redirect
10-49 High Should redirect
2-9 Medium Redirect if practical
1 Low Evaluate individually

Most backlink tools show links pointing to URLs that return 404:

  1. Look for a status code filter or broken backlinks report
  2. Filter to show only 404 URLs
  3. Export these URLs (they need redirects despite being broken)
⚠️ Backlinks to Non-Existent Pages
External sites often link to URLs that no longer exist on your site. These 404 URLs with backlinks should be redirected to the most relevant existing page to capture the link equity.
Export backlink data monthly during migration planning. New backlinks appear regularly, and you want to capture them all before launch.

Why Should I Crawl the XML Sitemap?

Capture Declared Important URLs

Your XML sitemap represents URLs you’ve explicitly told search engines are important. These should all be included in your redirect planning.

What Sitemaps Reveal

Sitemap Element Migration Use
URL list Pages you consider important
Last modified dates Recently updated content
Priority values Your content hierarchy
Change frequency Content update patterns

Extracting Sitemap URLs

Method 1: Direct download

https://example.com/sitemap.xml
https://example.com/sitemap_index.xml

Method 2: Screaming Frog

  1. Mode β†’ List
  2. Upload β†’ Download Sitemap
  3. Enter sitemap URL
  4. Crawl to validate URLs

Method 3: Search Console

  • Sitemaps report shows submitted URLs
  • Index coverage shows which are indexed

Sitemap vs Crawl Comparison

Compare your sitemap URLs against crawl results:

Scenario Meaning Action
In sitemap, found in crawl Normal Include in redirects
In sitemap, not in crawl Orphan page Verify page exists, include
In crawl, not in sitemap Missing from sitemap Include in redirects
If your sitemap is auto-generated by your CMS, it may be more current than a crawl. Always gather both and deduplicate.

How Do I Export URLs from Google Search Console?

Find What Google Knows

Google Search Console reveals URLs that Google has discovered and indexed, regardless of whether they appear in your crawl or sitemap.

Exporting URL Data

From Coverage Report:

  1. Navigate to Indexing β†’ Pages
  2. Click each status category (Valid, Excluded, etc.)
  3. Export the URL list for each category

From Performance Report:

  1. Navigate to Performance
  2. Click Pages tab
  3. Export to see URLs with impressions/clicks

Coverage Categories to Export

Category Why It Matters
Valid (Indexed) URLs appearing in search results
Valid with warnings Indexed but have issues
Excluded - Crawled not indexed Google found but didn’t index
Excluded - Discovered not indexed Google knows about but hasn’t crawled
Excluded - Redirect URLs Google sees as redirecting

Performance Data Value

URLs with search impressions or clicks are proven valuable:

  • Users are finding them via search
  • Google considers them relevant for queries
  • Losing these URLs means losing traffic

Export the last 16 months of data for the fullest picture.

⚠️ Search Console URL Limits
Search Console exports are limited to 1,000 rows in the UI. Use the Search Console API or Google's Bulk Data Export (BigQuery) for complete data on large sites.
Pay special attention to the "Excluded - Redirect" category. These show redirects Google has already detected. Ensure they're accounted for in your new redirect plan.

For a more powerful way to work with Search Console data, consider using SEOGets. Their Indexing report provides a more sophisticated view of your indexed pages than the native Search Console interface, making it easier to identify and export the URLs you need for redirect planning.

SEOGets Indexing Report

How Do I Gather URLs from Analytics?

Identify Traffic-Generating Pages

Analytics data shows which URLs actually receive visitor traffic. These are your highest-priority redirect candidates.

Exporting from Google Analytics (GA4)

  1. Navigate to Reports β†’ Engagement β†’ Pages and screens
  2. Set date range to last 12-16 months
  3. Export the full page path report

Key Metrics to Capture

Metric Priority Indicator
Sessions Overall traffic volume
Users Unique visitor count
Engagement rate Content quality signal
Conversions Business value

Creating Priority Tiers

Segment URLs by traffic volume:

Monthly Sessions Priority Redirect Treatment
1,000+ Critical Must redirect, verify destination
100-999 High Must redirect
10-99 Medium Should redirect
1-9 Low Redirect if practical
0 Lowest Redirect only if backlinks exist

Don’t Forget Landing Pages

Filter for pages where users enter your site:

  • These are often linked externally or bookmarked
  • Losing landing pages has outsized traffic impact
  • Prioritize redirects for top landing pages
Compare analytics URLs against your crawl. Pages with traffic that weren't found in the crawl may be orphaned content that still needs redirects.

Where Do I Find Existing 301 Redirect Tables?

Prevent Redirect Chains

Before creating new redirects, you must know what redirects already exist. Ignoring existing redirects creates chains that hurt SEO and performance.

Common Redirect Sources

Source Where to Find Export Method
CMS Redirect Admin WordPress, Shopify, etc. admin panel Built-in export or database query
Redirect Plugins Yoast, Redirection, Rank Math Plugin settings β†’ Export
Edge Services Cloudflare, Fastly, Netlify Dashboard β†’ Rules β†’ Export
Network Platforms Load balancers, CDNs Configuration files
Server Config .htaccess, nginx.conf Direct file access

CMS-Specific Locations

WordPress:

  • Redirection plugin: Tools β†’ Redirection β†’ Export
  • Yoast Premium: SEO β†’ Redirects β†’ Export
  • Database: wp_redirection_items table

Shopify:

  • Admin β†’ Content β†’ URL Redirects β†’ Export

Webflow:

  • Site Settings β†’ Publishing β†’ 301 Redirects

What to Document

For each existing redirect, capture:

Field Example
Source URL /old-page
Destination URL /new-page
Redirect Type 301 or 302
Location Plugin, .htaccess, CDN
Date Created 2024-03-15
⚠️ Multiple Redirect Sources
Many sites have redirects configured in multiple places (CMS, plugins, server, CDN). Audit ALL sources to get a complete picture. Missing one source can cause unexpected redirect behavior.
Consolidate all existing redirects into a single document before migration. This becomes your reference for what's already handled and what might conflict with new redirects.

What’s a Helpful Way to Use Redirect Chain Data?

Clean Up Before Migration

Redirect chains occur when one redirect points to another redirect, creating multiple hops. These hurt SEO and page speed. Migration is the perfect time to eliminate them.

Identifying Redirect Chains

In Screaming Frog:

  1. Crawl your site
  2. Filter by Status Code β†’ 3xx
  3. Look for redirects where Redirect URL is also a redirect

Chain example:

/page-a β†’ 301 β†’ /page-b β†’ 301 β†’ /page-c β†’ 200

This is a 2-hop chain that should become:
/page-a β†’ 301 β†’ /page-c
/page-b β†’ 301 β†’ /page-c

The Chain Resolution Process

  1. Map all redirect chains: Document every A→B→C pattern
  2. Identify final destinations: Find where each chain ultimately leads
  3. Update source redirects: Point directly to final destination
  4. Remove intermediate redirects: Delete unnecessary hops
  5. Verify resolution: Test that chains are eliminated

Common Chain Scenarios

Scenario Before After
HTTP to HTTPS to page http→https→/new http→/new (if HTTPS enforced at server)
Old redirect + new redirect /old→/middle→/new /old→/new, /middle→/new
WWW normalization chain non-www→www→/page non-www→/page (www at DNS level)
⚠️ Chains Waste Crawl Budget
Search engine bots may not follow long redirect chains, meaning pages at the end of chains might not get crawled or indexed properly. Google recommends a maximum of 2 hops.
Use your existing redirect table data to map all chains before creating new redirects. Update your master redirect list so every source URL points directly to its final destination on the new site.

How Do I Create a Unified URL Dataset?

Critical Final Step

After gathering URLs from all sources, combine them into a single, deduplicated dataset. This becomes your master redirect source list.

The Unification Process

Step 1: Standardize formats

  • Remove protocols (https://)
  • Remove domains (www.example.com)
  • Standardize trailing slashes
  • Convert to lowercase (if your site is case-insensitive)

Step 2: Validate through Screaming Frog

Run each URL list through Screaming Frog in List Mode:

  1. Mode β†’ List
  2. Upload your URL list
  3. Start crawl to validate each URL
  4. Export results with status codes

This confirms the current status of every URL across all sources.

Step 3: Combine and deduplicate

Source A: 5,000 URLs
Source B: 3,500 URLs
Source C: 8,200 URLs
Source D: 2,100 URLs
─────────────────────
Combined: 18,800 URLs
After dedup: 12,400 unique URLs

Step 4: Enrich with metadata

Add columns from each source:

URL Status Backlinks Sessions In Sitemap Has Redirect
/page-a 200 45 1,200 Yes No
/page-b 404 12 0 No No
/page-c 301 8 340 Yes Yes
Keep your unified dataset in a version-controlled spreadsheet or database. You'll reference and update it throughout the migration process.

Ready to Map Your URLs?

Once you’ve gathered URLs from all sources and created your unified dataset, the next step is mapping old URLs to new destinations. If you’ve done redirect work before, you know this is traditionally the most time-consuming part of redirect work, but it doesn’t have to be.

Redirects.net uses intelligent matching algorithms to automatically map your old URLs to the best destinations on your new site. Upload your unified URL list, and get mapped redirects ready for implementation.

Try Redirects.net Free β†’