[users@httpd] Apache prefork overload caused by Googlebot crawling many host-routed sites

Phong Thai Tue, 24 Feb 2026 20:12:31 -0800

Hello,

I’m looking for advice on handling crawler-driven overload in an Apacheprefork environment.


Environment:
- Apache httpd with prefork MPM
- CentOS 7.4
- ~2 CPU / 4 GB RAM
- prefork must remain in use

Architecture summary:
- Multiple main domains
- Tens of thousands of very small sites, each with its own hostname

- All hostnames are routed through a central VirtualHost usingvhost-level rewrite rules (no .htaccess)- Each hostname maps dynamically to a directory such as:/app/sites/{unique-sub-domain-slug}/


Under normal conditions the system behaves well.

Issue:

When Googlebot crawls these small sites, Apache load spikes severely(load averages > 200). httpd processes grow rapidly and many sitesbecome unreachable until crawler activity subsides. Main domains remainresponsive during these events.


Steps already taken:
- All rewrite logic moved from .htaccess to VirtualHost
- AllowOverride disabled
- Conservative timeouts and connection limits applied
- Resources increased compared to previous smaller deployment

This same design handled ~150 sites reasonably well in the past. With amuch larger number of sites, overload now happens daily.


My questions:
- Is this a known failure mode of prefork under heavy crawler activity?

- Are there Apache-level techniques to limit crawler impact withoutblocking Googlebot?- In similar setups, what usually becomes the bottleneck first: rewriteprocessing, filesystem checks, or process spawning?


Any insight or real-world experience would be greatly appreciated.

Thank you.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[users@httpd] Apache prefork overload caused by Googlebot crawling many host-routed sites

Reply via email to