At 03:44 PM 8.14.2003 +0100, Jez Hancock wrote: >On Thu, Aug 14, 2003 at 08:49:49AM -0500, Jack L. Stone wrote: >> Server Version: Apache/1.3.27 (Unix) FrontPage/5.0.2.2510 PHP/4.3.1 >> The above is typical of the servers in use, and with csh shells employed, >> plus IPFW. >> >> My apologies for the length of this question, but the background seems >> necessary as brief as I can make it so the question makes sense. >> >> The problem: >> We have several servers that provide online reading of Technical articles >> and each have several hundred MB to a GB of content. >> >> When we started providing the articles 6-7 years ago, folks used browsers >> to read the articles. Now, the trend has become a more lazy approach and >> there is an increasing use of those download utilities which can be left >> unattended to download entire web sites taking several hours to do so. >> Multiply this by a number of similar downloads and there goes the >> bandwidth, denying those other normal online readers the speed needed for >> loading and browsing in the manner intended. Several hundred will be >> reading at a time and several 1000 daily. ><snip> >There is no easy solution to this, but one avenue might be to look at >bandwidth throttling in an apache module. > >One that I've used before is mod_throttle which is in the ports: > >/usr/ports/www/mod_throttle > >which allows you to throttle users by ip address to a certain number of >documents and/or up to a certain transfer limit. IIRC it's fairly >limited though in that you can only apply per IP limits to _every_ >virtual host - ie in the global httpd.conf context. > >A more finegrained solution (from what I've read, haven't tried it) is >mod_bwshare - this one isn't in the ports but can be found here: > >http://www.topology.org/src/bwshare/ > >this module overcomes some of the shortfalls of mod_throttle and allows >you to specify finer granularity over who consumes how much bandwidth >over what time period. > >> Now, my question: Is it possible to write a script that can constantly scan >> the Apache logs to look for certain footprints of those downloaders, >> perhaps the names, like "HTTRACK", being one I see a lot. Whenever I see >> one of those sessions, I have been able to abort them by adding a rule to >> the firewall to deny the IP address access to the server. This aborts the >> downloading, but have seen the attempts constantly continue for a day or >> two, confirming unattended downloads. >> >> Thus, if the script could spot an "offender" and then perhaps make use of >> the firewall to add a rule containing the offender's IP address and then >> flush to reset the firewall, this would at least abort the download and >> free up the bandwidth (I already have a script that restarts the firewall). >> >> Is this possible and how would I go about it....??? >If you really wanted to go down this route then I found a script someone >wrote a while back to find 'rude robots' from a httpd logfile which you >could perhaps adapt to do dynamic filtering in conjunction with your >firewall: > >http://stein.cshl.org/~lstein/talks/perl_conference/cute_tricks/log9.html > >If you have any success let me know. > >-- >Jez >
Interesting. Looks like a step in the right direction. Will weigh this one along the possibilities. Many thanks...! Best regards, Jack L. Stone, Administrator SageOne Net http://www.sage-one.net [EMAIL PROTECTED] _______________________________________________ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"