On Sunday 03 November 2019 10:34:09 john doe wrote: > On 11/3/2019 4:04 PM, Gene Heskett wrote: > > Greetings all > > > > I am developing a list of broken webcrawlers who are repeatedly > > downloading my entire web site including the hidden stuff. > > > > These crawlers/bots are ignoring my robots.txt files and aren't just > > indexing the site, but are downloading every single bit of every > > file there. > > > > This is burning up my upload bandwidth and constitutes a DDOS when 4 > > or 5 bots all go into this pull it all mode at the same time. > > > > How do I best deal with these poorly written bots? I can target the > > individual address of course, but have chosen to block the /24, but > > that seems not to bother them for more than 30 minutes. Its also a > > too broad brush, blocking legit addresses access. Restarting apache2 > > also work, for half an hour or so, but I may be interrupting a legit > > request for a realtime kernel whose built tree is around 2.7GB in > > tgz format > > > > How do I get their attention to stop the DDOS? Or is this a war you > > cannot win? > > 'fail2ban' for the bots that does not respect robot.txt. > Wasn't installed by this stretch version. Is now, reading man page's. Frankly this looks dangerous when attempted to be run as beginning users. There ought to be a startup tutorial based on setting up the logging, then specifying who you want blocked from reading the logs. Is there a formal tut of setting this up someplace?
Thanks John Doe. > -- > John Doe Cheers, Gene Heskett -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis Genes Web page <http://geneslinuxbox.net:6309/gene>