Back to Blog
2026-03-15·12 min read

What Happens When You Deploy 140,000 Pages on a Brand-New Domain?

Last week I launched SalaryGlobe.com, a salary database built on Bureau of Labor Statistics data, on a fresh domain with no history, no backlinks, and no existing audience. 140,000+ pages. 818 occupations across 393 metro areas. Zero authority to speak of.

The setup was deliberately minimal. A Cloudflare Worker doing server-side rendering out of a D1 SQLite database, with every single request logged, URL, user agent, country, bot classification against 40+ known crawler patterns. One intentional wrinkle: the sitemap only exposed ~1,000 URLs (top 50 jobs × top 20 cities). The other 139,000+ pages were live and reachable, just not announced.

Seventy hours later, I had data I wasn't expecting.


The Numbers

MetricValue
Total requests301,876
Unique pages hit140,315
Observation window70.4 hours
First requestMar 12, 12:08 UTC
Last requestMar 15, 10:34 UTC

Finding 1: AI Crawlers Ate 93% of All Traffic

CategoryRequests% of TotalUnique Pages
AI Crawlers282,12593.46%140,305
Undetected17,7285.87%11,913
Search Engines2,0200.67%1,416
Social Media20.00%1
Training / Scraping10.00%1

The "undetected" bucket covers bots that slipped past the pattern matcher, more on those in Finding 7. But the number that stopped me cold: for every single Google request, there were 140 AI crawler requests.

Not 2x. Not 10x. A hundred and forty.

Bot-by-Bot Breakdown

BotRequestsUnique Pages% of Total
ClaudeBot142,073140,00047.06%
GPTBot139,952139,94046.36%
Undetected17,72811,9135.87%
GoogleOther1,5111,4120.50%
Googlebot4982870.16%
OAI-SearchBot78770.03%
ChatGPT-User20100.01%
BingBot73~0%
Everyone else77~0%

ClaudeBot and GPTBot combined: 93.4% of all traffic. Google's full crawler fleet, Googlebot plus GoogleOther, clocked in at 0.66%.


Finding 2: They Found Everything. Without the Sitemap.

This is the one that genuinely surprised me.

The sitemap listed roughly 1,000 URLs. But GPTBot crawled 139,940 unique pages, 99.7% of the site. ClaudeBot hit 140,000. Neither one needed the sitemap to get there. They just followed internal links: job pages point to city pages, city pages point back to job pages, every salary page links out to related jobs and nearby metros. The bots walked the whole graph.

Off-Sitemap Discovery

BotOff-Sitemap RequestsOff-Sitemap Pages Discovered
GPTBot125,465125,454
ClaudeBot127,100125,160
Undetected12,8109,668
GoogleOther914874
Googlebot346199

GPTBot found 125,454 pages that were never listed anywhere. ClaudeBot found 125,160. Googlebot? 199.

Both AI crawlers also swept through nearly every job index page, around 810 of 818 occupations, even though the sitemap only mentioned 50. They didn't wait to be told where to look.


Finding 3: Three Completely Different Playbooks

ClaudeBot, Hits Hard, Leaves Fast

ClaudeBot showed up first (12:53 UTC, Mar 12) and didn't ease into it. At its peak, it was firing 6,516 requests in a single 10-minute window, around 11 per second. It operated in intense bursts with brief quiet gaps, stacking multiple 5,000+ request hours in a row. As the remaining unvisited pages dwindled, the pace came down naturally. By Mar 13, 08:00 UTC, just 19 hours after it arrived, the main crawl was done.

GPTBot, Slow Burn, Never Stops

GPTBot started later (19:20, Mar 12) and took a completely different approach. It settled into a pace of roughly 1,080–1,100 requests per 10-minute window and just... held it. For 22+ straight hours. About 1.8 requests per second, barely varying. Even as the pool of uncrawled pages shrank, the rate barely flinched. It wrapped up by Mar 14, 00:28 UTC, 34 hours after arriving.

Two bots, same destination, totally different ways of getting there.

Google, Careful, Deliberate, Still Going

Googlebot averaged single digits per hour. GoogleOther was a bit more active but still nowhere close:

  • Googlebot: 498 requests across 70 hours (~7/hour)
  • GoogleOther: 1,511 requests across 70 hours (~22/hour)

Days later, Google is still slowly working through the site.


Finding 4: The Hour-by-Hour Timeline

Here's how the first 12 hours unfolded:

Hour (UTC)RequestsUnique PagesActive Bots
Mar 12 12:0089231
Mar 12 13:003691
Mar 12 14:00965910
Mar 12 15:006,0436,0293
Mar 12 16:002,3271,8792
Mar 12 17:006,4466,1162
Mar 12 18:001,4931,4762
Mar 12 19:0022,15721,2686
Mar 12 20:0022,89422,5344
Mar 12 21:0012,41011,9764
Mar 12 22:0020,79119,7276
Mar 12 23:0021,25520,4684

Daily Summary

DayRequestsUnique PagesActive Bots
Mar 12 (half day)116,03793,88513
Mar 13168,969119,2566
Mar 144,3433,8196
Mar 15 (partial)2,5442,3385

The AI crawlers were in and out in 36 hours. Google is still working the queue.


Finding 5: Remarkably Little Wasted Work

BotUnique PagesTotal RequestsRequests / Page
GPTBot139,940139,9521.0
ClaudeBot140,000142,0731.0
GoogleOther1,4121,5111.1
Googlebot2874981.7
ChatGPT-User10202.0

GPTBot hit 139,940 unique pages across 139,952 total requests. That's not a typo, it almost never hit the same page twice. ClaudeBot was the same. Both bots are clearly tracking what they've already seen and skipping it. No wheel-spinning.

Googlebot, by contrast, revisited the homepage 9 times and came back to several other pages more than once, a 1.7x ratio overall. The homepage was the only page any bot consistently returned to, accumulating 535 total hits.


Finding 6: What They Actually Went After

98.8% of all requests went straight to the data pages. The homepage was a jumping-off point, not a destination, bots landed, grabbed the links, and disappeared into the content.

Most Crawled Pages (Excluding Homepage)

PageRequestsDistinct Bots
/salary/cardiologists304
/salary/pediatricians-general305
/city/san-diego-ca294
/city/san-francisco-ca293
/salary/surgeons-all-other292
/salary/airline-pilots-copilots-and-flight-engineers284
/salary/dermatologists285

High-paying jobs dominated the top of this list. And every bot that hit these pages did so independently, Anthropic, OpenAI, and Google all converged on the same content without any coordination. Make of that what you will.


Finding 7: The Traffic That Slipped Through

17,728 requests, 5.87% of the total, came from user agents that didn't match anything in the detection list.

User AgentRequestsUnique Pages
Claude-SearchBot/1.07,0527,035
MJ12bot/v1.4.84,6894,473
Chrome/42 (Edge/12.246)3,667688
serpstatbot/2.11,0791,027
Android Chrome 117387146
iPhone Safari 13118104
Firefox 10211758

Finding 8: Where the Requests Came From

CountryRequests%
US281,98096.6%
GB3,1201.1%
FR3,0371.0%
CA1,1550.4%
DE1,1130.4%
UA7250.2%
FI2400.1%
CL2400.1%

96.6% of all traffic came from US-based IPs. Every major identified bot crawled exclusively from US infrastructure. The international traffic, GB, FR, CA, DE, Ukraine, came entirely from undetected user agents. Likely scrapers and bots running from international infrastructure.


What This Actually Means

AI crawlers have become the first movers on new content. On a domain with no history, no promotion, and zero backlinks, they outnumbered Google 140:1 and mapped the entire site in under 36 hours. Google is still getting there.

These bots don't waste trips. Near-perfect 1:1 request-to-page ratios across hundreds of thousands of requests. They track what they've seen and don't re-crawl it. Efficient in a way that honestly impressed me.

ClaudeBot and GPTBot are built differently. ClaudeBot sprints, 11 requests/second at peak, done in 19 hours. GPTBot is a marathon runner, 1.8 requests/second, rock-steady for 22+ hours.

Google's crawl budget on new domains is tiny. 498 Googlebot requests over 70 hours for a 140,000-page site. GoogleOther is about 3x faster, but still, patience is the only real strategy for Google indexing on a new domain.


Methodology

  • Platform: Cloudflare Worker + D1 (SQLite) on salaryglobe.com
  • Data source: BLS OEWS May 2024 occupational employment and wage statistics
  • Tracking: Server-side user-agent matching against 40+ known bot patterns, logged per-request to D1
  • Sitemap: Phase 1 rollout, top 50 jobs × top 20 cities (~1,000 URLs exposed); all other pages live but not announced
  • Observation period: Mar 12, 2026 12:08 UTC → Mar 15, 2026 10:34 UTC (70.4 hours)

Win customers from ChatGPT