> Nope, that page is served out by Apache using its autoindex module.
>
> Gerard, we could just configure Apache to use
> 'SuppressColumnSorting' 
> (http://httpd.apache.org/docs/2.2/mod/mod_autoindex.html#indexoptions) - it 
> won't stop bots from downloading masses of data if that's what they're intent 
> on doing, but for otherwise innocent scripts that are being tripped up by the 
> column sorting hyperlinks, it'll prevent them getting multiple copies of 
> everything.
>

You beat me to it. That will fix the recursive wget issue but not 
entirely the "acts like a bot, downloads the entire website, but won't 
honour robots.txt" that Bruce alluded to as well.

Just over the last two days alone there have quite a few unique IP 
addresses that use wget to download almost all of 
www.linuxfromscratch.org (682 MB) and before I disabled it, the 
mailinglist archives as well. The total impact of most recursive 
downloads was a few GB per IP address. Multiple that over a full day 
then over a full month and that's a few hundred GB.

What we really need is a proper packet shaper; let everybody download 
what they want but up to a maximum rate (Mbps) and amount (MB) per time 
unit. We'll have that capability on the new server. Not on the current 
one without spending more time and effort than is worth seeing we're 
moving off of it regardless. I'm more interested in simple stop-gap 
measures while we're preparing the migration. That process won't take 
more than a few days to complete once it's in full swing.

Gerard
-- 
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page

Reply via email to