Is anyone else getting problems with the facebook web crawler hammering
their OPAC search function?
This has been happening on and off for a couple of months but set in
with a vengeance a couple of days ago. The crawler is hitting us with
many OPAC search queries, beyond the capacity of our sy
Hi Nigel
In such a case I would advise to create a sitemap - unfortunately this
Koha feature seems not so well documented, but the following may give
you a start:
* https://lists.katipo.co.nz/public/koha/2020-November/055401.html
*
https://wiki.koha-community.org/wiki/Commands_provided_by_t
Dear Michael
On 25/07/2024 13:28, Michael Kuhn wrote:
Hi Nigel
In such a case I would advise to create a sitemap - unfortunately this
Koha feature seems not so well documented, but the following may give
you a start:
* https://lists.katipo.co.nz/public/koha/2020-November/055401.html
*
htt
While they do ignore robots.txt they do at least supply a recognizable
user agent that you can just block:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit|other|bots|here"
RewriteCond %{REQUEST_URI} "!403\.pl" [NC]
RewriteRule "^.*" "-" [F]
Note that second RewriteCond is r
On 25/07/2024 13:55, Jason Boyer wrote:
While they do ignore robots.txt they do at least supply a recognizable
user agent that you can just block:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit|other|bots|here"
RewriteCond %{REQUEST_URI} "!403\.pl" [NC]
RewriteRule "^.*"
We've had a couple recent crashes I haven't yet had time to dig into. This
would explain it :/
And as I look now, I also see a bunch of AmazonBot, but I haven't yet
checked whether this would at least respect robots.txt
The really annoying thing about this is the catalog is there to be public.
It'
Hi Nigel et al,
I recently noticed the load on our Koha server was getting ridiculously
high and investigation showed that most of it was bot requests for
opan-search.pl (averaging about one a second!) I have managed to stop the
ones that were hurting us with robots.txt and I am fairly confident
Hi Nigel,
My solution for that is simple two step process:
1) using mod_sec to monitor and match the UA string of the incoming request
against a list of UAs I don't want and return a HTTP 406 if the UA matches
for the first time.
2) Have fail2ban monitor the apache log for 406 and immediately ba
On 25/07/2024 13:55, Jason Boyer wrote:
While they do ignore robots.txt they do at least supply a recognizable
user agent that you can just block:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit|other|bots|here"
RewriteCond %{REQUEST_URI} "!403\.pl" [NC]
RewriteRule "^.*"
On 25/07/2024 23:31, Nigel Titley wrote:
On 25/07/2024 13:55, Jason Boyer wrote:
While they do ignore robots.txt they do at least supply a recognizable
user agent that you can just block:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit|other|bots|here"
RewriteCond %{RE
You might skip mod_sec and do the detection with fail2ban's
apache-badbots, by changing its regex to (the spaces ARE important,
copy and paste that):
failregex = ^(?:\S+:\d+ )? [^"]*"[A-Z]+ [^"]+" \d+ \d+ "[^"]*"
"[^"]*(?:|)[^"]*"
adding the bad bots to the start of the "badbots" regex like
Hello,
I have created a SQL Query about the newly entered books. I have a list
called New Books in the Lists module. In order to keep the list up to date,
I run the SQL Query and save the list to my computer, get the barcode
numbers and add them to the New Books list in the User lists. These
oper
12 matches
Mail list logo