Hello,

I’m looking for advice on handling crawler-driven overload in an Apache prefork environment.

Environment:
- Apache httpd with prefork MPM
- CentOS 7.4
- ~2 CPU / 4 GB RAM
- prefork must remain in use

Architecture summary:
- Multiple main domains
- Tens of thousands of very small sites, each with its own hostname
- All hostnames are routed through a central VirtualHost using vhost-level rewrite rules (no .htaccess) - Each hostname maps dynamically to a directory such as: /app/sites/{unique-sub-domain-slug}/

Under normal conditions the system behaves well.

Issue:
When Googlebot crawls these small sites, Apache load spikes severely (load averages > 200). httpd processes grow rapidly and many sites become unreachable until crawler activity subsides. Main domains remain responsive during these events.

Steps already taken:
- All rewrite logic moved from .htaccess to VirtualHost
- AllowOverride disabled
- Conservative timeouts and connection limits applied
- Resources increased compared to previous smaller deployment

This same design handled ~150 sites reasonably well in the past. With a much larger number of sites, overload now happens daily.

My questions:
- Is this a known failure mode of prefork under heavy crawler activity?
- Are there Apache-level techniques to limit crawler impact without blocking Googlebot? - In similar setups, what usually becomes the bottleneck first: rewrite processing, filesystem checks, or process spawning?

Any insight or real-world experience would be greatly appreciated.

Thank you.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to