On 11/3/2019 4:04 PM, Gene Heskett wrote: > Greetings all > > I am developing a list of broken webcrawlers who are repeatedly > downloading my entire web site including the hidden stuff. > > These crawlers/bots are ignoring my robots.txt files and aren't just > indexing the site, but are downloading every single bit of every file > there. > > This is burning up my upload bandwidth and constitutes a DDOS when 4 or 5 > bots all go into this pull it all mode at the same time. > > How do I best deal with these poorly written bots? I can target the > individual address of course, but have chosen to block the /24, but that > seems not to bother them for more than 30 minutes. Its also a too broad > brush, blocking legit addresses access. Restarting apache2 also work, > for half an hour or so, but I may be interrupting a legit request for a > realtime kernel whose built tree is around 2.7GB in tgz format > > How do I get their attention to stop the DDOS? Or is this a war you > cannot win? >
'fail2ban' for the bots that does not respect robot.txt. -- John Doe