On 11/3/2019 5:32 PM, Gene Heskett wrote: > On Sunday 03 November 2019 10:34:09 john doe wrote: > >> On 11/3/2019 4:04 PM, Gene Heskett wrote: >>> Greetings all >>> >>> I am developing a list of broken webcrawlers who are repeatedly >>> downloading my entire web site including the hidden stuff. >>> >>> These crawlers/bots are ignoring my robots.txt files and aren't just >>> indexing the site, but are downloading every single bit of every >>> file there. >>> >>> This is burning up my upload bandwidth and constitutes a DDOS when 4 >>> or 5 bots all go into this pull it all mode at the same time. >>> >>> How do I best deal with these poorly written bots? I can target the >>> individual address of course, but have chosen to block the /24, but >>> that seems not to bother them for more than 30 minutes. Its also a >>> too broad brush, blocking legit addresses access. Restarting apache2 >>> also work, for half an hour or so, but I may be interrupting a legit >>> request for a realtime kernel whose built tree is around 2.7GB in >>> tgz format >>> >>> How do I get their attention to stop the DDOS? Or is this a war you >>> cannot win? >> >> 'fail2ban' for the bots that does not respect robot.txt. >> > Wasn't installed by this stretch version. Is now, reading man page's. > Frankly this looks dangerous when attempted to be run as beginning users. > There ought to be a startup tutorial based on setting up the logging, > then specifying who you want blocked from reading the logs. Is there a > formal tut of setting this up someplace? >
Those are more hints then an howto: https://askubuntu.com/questions/1116001/block-badbot-with-fail2ban-via-user-agents-in-access-log https://www.booleanworld.com/blocking-bad-bots-fail2ban/ Or with Iptables: https://blog.nintechnet.com/how-to-block-w00tw00t-at-isc-sans-dfind-and-other-web-vulnerability-scanners/ https://javapipe.com/blog/iptables-ddos-protection/ I guess I would impliment both approaches. Does your website realy need to be available to the world? Can't you consider an VPS with anti-DDoS capability? HTH. -- John Doe