You might skip mod_sec and do the detection with fail2ban's
apache-badbots, by changing its regex to (the spaces ARE important,
copy and paste that):
failregex = ^(?:\S+:\d+ )?<ADDR> [^"]*"[A-Z]+ [^"]+" \d+ \d+ "[^"]*"
"[^"]*(?:<badbots>|<badbotscustom>)[^"]*"
adding the bad bots to the start of the "badbots" regex like:
badbots =
meta-externalagent|facebookexternalhit|SemrushBot|amazonbot|AmazonBot|ClaudeBot|claudebot|Atomic_Email_Hunter/4\.0|
... rest of the regex stays here.
and adding a jail like this:
[apache-badbots]
enabled = true
port = http,https
filter = apache-badbots
bantime = 48h
logpath = %(apache_access_log)s
maxretry = 1
[apache-badbots2]
enabled = true
port = http,https
filter = apache-badbots
bantime = 48h
logpath = /var/log/koha/USEYOURKOHASITENAMEHERE/plack.log
maxretry = 1
On 7/25/24 10:15, Indranil Das Gupta wrote:
Hi Nigel,
My solution for that is simple two step process:
1) using mod_sec to monitor and match the UA string of the incoming request
against a list of UAs I don't want and return a HTTP 406 if the UA matches
for the first time.
2) Have fail2ban monitor the apache log for 406 and immediately ban the IP
(IPv4 / IPv6) for 96 hours using an apache-badbots jail.
This strategy has so far managed to keep my servers "cool".
cheers
-idg
On Thu, Jul 25, 2024, 16:57 Nigel Titley<ni...@titley.com> wrote:
Is anyone else getting problems with the facebook web crawler hammering
their OPAC search function?
This has been happening on and off for a couple of months but set in
with a vengeance a couple of days ago. The crawler is hitting us with
many OPAC search queries, beyond the capacity of our system to respond.
robots.txt is being ignored
I started by blocking facebook's entire IPv6 range as the queries were
all coming in over IPv6. They responded by switching to IPv4 and because
they have a number of blocks it wasn't practical to block each and every
one of them.
I've temporarily switched off OPAC entirely and the system has returned
to normal and I can at least perform intranet functions but this is
obviously non-ideal.
Does anyone have any thoughts on this?
I'm running 22.05.13.000 on Ubuntu.
Thanks
Nigel
_______________________________________________
Koha mailing listhttp://koha-community.org
Koha@lists.katipo.co.nz
Unsubscribe:https://lists.katipo.co.nz/mailman/listinfo/koha
_______________________________________________
Koha mailing listhttp://koha-community.org
Koha@lists.katipo.co.nz
Unsubscribe:https://lists.katipo.co.nz/mailman/listinfo/koha
--
Hector Gonzalez
ca...@genac.org
_______________________________________________
Koha mailing list http://koha-community.org
Koha@lists.katipo.co.nz
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha