What Happens When You Deploy 140,000 Pages on a Brand-New Domain?

Last week I launched SalaryGlobe.com, a salary database built on Bureau of Labor Statistics data, on a fresh domain with no history, no backlinks, and no existing audience. 140,000+ pages. 818 occupations across 393 metro areas. Zero authority to speak of.

The setup was deliberately minimal. A Cloudflare Worker doing server-side rendering out of a D1 SQLite database, with every single request logged, URL, user agent, country, bot classification against 40+ known crawler patterns. One intentional wrinkle: the sitemap only exposed ~1,000 URLs (top 50 jobs × top 20 cities). The other 139,000+ pages were live and reachable, just not announced.

Seventy hours later, I had data I wasn't expecting.

The Numbers

Metric	Value
Total requests	301,876
Unique pages hit	140,315
Observation window	70.4 hours
First request	Mar 12, 12:08 UTC
Last request	Mar 15, 10:34 UTC

Finding 1: AI Crawlers Ate 93% of All Traffic

Category	Requests	% of Total	Unique Pages
AI Crawlers	282,125	93.46%	140,305
Undetected	17,728	5.87%	11,913
Search Engines	2,020	0.67%	1,416
Social Media	2	0.00%	1
Training / Scraping	1	0.00%	1

The "undetected" bucket covers bots that slipped past the pattern matcher, more on those in Finding 7. But the number that stopped me cold: for every single Google request, there were 140 AI crawler requests.

Not 2x. Not 10x. A hundred and forty.

Bot-by-Bot Breakdown

Bot	Requests	Unique Pages	% of Total
ClaudeBot	142,073	140,000	47.06%
GPTBot	139,952	139,940	46.36%
Undetected	17,728	11,913	5.87%
GoogleOther	1,511	1,412	0.50%
Googlebot	498	287	0.16%
OAI-SearchBot	78	77	0.03%
ChatGPT-User	20	10	0.01%
BingBot	7	3	~0%
Everyone else	7	7	~0%

ClaudeBot and GPTBot combined: 93.4% of all traffic. Google's full crawler fleet, Googlebot plus GoogleOther, clocked in at 0.66%.

Finding 2: They Found Everything. Without the Sitemap.

This is the one that genuinely surprised me.

The sitemap listed roughly 1,000 URLs. But GPTBot crawled 139,940 unique pages, 99.7% of the site. ClaudeBot hit 140,000. Neither one needed the sitemap to get there. They just followed internal links: job pages point to city pages, city pages point back to job pages, every salary page links out to related jobs and nearby metros. The bots walked the whole graph.

Off-Sitemap Discovery

Bot	Off-Sitemap Requests	Off-Sitemap Pages Discovered
GPTBot	125,465	125,454
ClaudeBot	127,100	125,160
Undetected	12,810	9,668
GoogleOther	914	874
Googlebot	346	199

GPTBot found 125,454 pages that were never listed anywhere. ClaudeBot found 125,160. Googlebot? 199.

Both AI crawlers also swept through nearly every job index page, around 810 of 818 occupations, even though the sitemap only mentioned 50. They didn't wait to be told where to look.

Finding 3: Three Completely Different Playbooks

ClaudeBot, Hits Hard, Leaves Fast

ClaudeBot showed up first (12:53 UTC, Mar 12) and didn't ease into it. At its peak, it was firing 6,516 requests in a single 10-minute window, around 11 per second. It operated in intense bursts with brief quiet gaps, stacking multiple 5,000+ request hours in a row. As the remaining unvisited pages dwindled, the pace came down naturally. By Mar 13, 08:00 UTC, just 19 hours after it arrived, the main crawl was done.

GPTBot, Slow Burn, Never Stops

GPTBot started later (19:20, Mar 12) and took a completely different approach. It settled into a pace of roughly 1,080–1,100 requests per 10-minute window and just... held it. For 22+ straight hours. About 1.8 requests per second, barely varying. Even as the pool of uncrawled pages shrank, the rate barely flinched. It wrapped up by Mar 14, 00:28 UTC, 34 hours after arriving.

Two bots, same destination, totally different ways of getting there.

Google, Careful, Deliberate, Still Going

Googlebot averaged single digits per hour. GoogleOther was a bit more active but still nowhere close:

Googlebot: 498 requests across 70 hours (~7/hour)
GoogleOther: 1,511 requests across 70 hours (~22/hour)

Days later, Google is still slowly working through the site.

Finding 4: The Hour-by-Hour Timeline

Here's how the first 12 hours unfolded:

Hour (UTC)	Requests	Unique Pages	Active Bots
Mar 12 12:00	89	23	1
Mar 12 13:00	36	9	1
Mar 12 14:00	96	59	10
Mar 12 15:00	6,043	6,029	3
Mar 12 16:00	2,327	1,879	2
Mar 12 17:00	6,446	6,116	2
Mar 12 18:00	1,493	1,476	2
Mar 12 19:00	22,157	21,268	6
Mar 12 20:00	22,894	22,534	4
Mar 12 21:00	12,410	11,976	4
Mar 12 22:00	20,791	19,727	6
Mar 12 23:00	21,255	20,468	4

Daily Summary

Day	Requests	Unique Pages	Active Bots
Mar 12 (half day)	116,037	93,885	13
Mar 13	168,969	119,256	6
Mar 14	4,343	3,819	6
Mar 15 (partial)	2,544	2,338	5

The AI crawlers were in and out in 36 hours. Google is still working the queue.

Finding 5: Remarkably Little Wasted Work

Bot	Unique Pages	Total Requests	Requests / Page
GPTBot	139,940	139,952	1.0
ClaudeBot	140,000	142,073	1.0
GoogleOther	1,412	1,511	1.1
Googlebot	287	498	1.7
ChatGPT-User	10	20	2.0

GPTBot hit 139,940 unique pages across 139,952 total requests. That's not a typo, it almost never hit the same page twice. ClaudeBot was the same. Both bots are clearly tracking what they've already seen and skipping it. No wheel-spinning.

Googlebot, by contrast, revisited the homepage 9 times and came back to several other pages more than once, a 1.7x ratio overall. The homepage was the only page any bot consistently returned to, accumulating 535 total hits.

Finding 6: What They Actually Went After

98.8% of all requests went straight to the data pages. The homepage was a jumping-off point, not a destination, bots landed, grabbed the links, and disappeared into the content.

Most Crawled Pages (Excluding Homepage)

Page	Requests	Distinct Bots
/salary/cardiologists	30	4
/salary/pediatricians-general	30	5
/city/san-diego-ca	29	4
/city/san-francisco-ca	29	3
/salary/surgeons-all-other	29	2
/salary/airline-pilots-copilots-and-flight-engineers	28	4
/salary/dermatologists	28	5

High-paying jobs dominated the top of this list. And every bot that hit these pages did so independently, Anthropic, OpenAI, and Google all converged on the same content without any coordination. Make of that what you will.

Finding 7: The Traffic That Slipped Through

17,728 requests, 5.87% of the total, came from user agents that didn't match anything in the detection list.

User Agent	Requests	Unique Pages
Claude-SearchBot/1.0	7,052	7,035
MJ12bot/v1.4.8	4,689	4,473
Chrome/42 (Edge/12.246)	3,667	688
serpstatbot/2.1	1,079	1,027
Android Chrome 117	387	146
iPhone Safari 13	118	104
Firefox 102	117	58

Finding 8: Where the Requests Came From

Country	Requests	%
US	281,980	96.6%
GB	3,120	1.1%
FR	3,037	1.0%
CA	1,155	0.4%
DE	1,113	0.4%
UA	725	0.2%
FI	240	0.1%
CL	240	0.1%

96.6% of all traffic came from US-based IPs. Every major identified bot crawled exclusively from US infrastructure. The international traffic, GB, FR, CA, DE, Ukraine, came entirely from undetected user agents. Likely scrapers and bots running from international infrastructure.

What This Actually Means

AI crawlers have become the first movers on new content. On a domain with no history, no promotion, and zero backlinks, they outnumbered Google 140:1 and mapped the entire site in under 36 hours. Google is still getting there.

These bots don't waste trips. Near-perfect 1:1 request-to-page ratios across hundreds of thousands of requests. They track what they've seen and don't re-crawl it. Efficient in a way that honestly impressed me.

ClaudeBot and GPTBot are built differently. ClaudeBot sprints, 11 requests/second at peak, done in 19 hours. GPTBot is a marathon runner, 1.8 requests/second, rock-steady for 22+ hours.

Google's crawl budget on new domains is tiny. 498 Googlebot requests over 70 hours for a 140,000-page site. GoogleOther is about 3x faster, but still, patience is the only real strategy for Google indexing on a new domain.

Methodology

Platform: Cloudflare Worker + D1 (SQLite) on salaryglobe.com
Data source: BLS OEWS May 2024 occupational employment and wage statistics
Tracking: Server-side user-agent matching against 40+ known bot patterns, logged per-request to D1
Sitemap: Phase 1 rollout, top 50 jobs × top 20 cities (~1,000 URLs exposed); all other pages live but not announced
Observation period: Mar 12, 2026 12:08 UTC → Mar 15, 2026 10:34 UTC (70.4 hours)