> Nope, that page is served out by Apache using its autoindex module. > > Gerard, we could just configure Apache to use > 'SuppressColumnSorting' > (http://httpd.apache.org/docs/2.2/mod/mod_autoindex.html#indexoptions) - it > won't stop bots from downloading masses of data if that's what they're intent > on doing, but for otherwise innocent scripts that are being tripped up by the > column sorting hyperlinks, it'll prevent them getting multiple copies of > everything. >
You beat me to it. That will fix the recursive wget issue but not entirely the "acts like a bot, downloads the entire website, but won't honour robots.txt" that Bruce alluded to as well. Just over the last two days alone there have quite a few unique IP addresses that use wget to download almost all of www.linuxfromscratch.org (682 MB) and before I disabled it, the mailinglist archives as well. The total impact of most recursive downloads was a few GB per IP address. Multiple that over a full day then over a full month and that's a few hundred GB. What we really need is a proper packet shaper; let everybody download what they want but up to a maximum rate (Mbps) and amount (MB) per time unit. We'll have that capability on the new server. Not on the current one without spending more time and effort than is worth seeing we're moving off of it regardless. I'm more interested in simple stop-gap measures while we're preparing the migration. That process won't take more than a few days to complete once it's in full swing. Gerard -- http://linuxfromscratch.org/mailman/listinfo/lfs-dev FAQ: http://www.linuxfromscratch.org/faq/ Unsubscribe: See the above information page