[Koha] Problems with the facebook web crawler

2024-07-25 Thread Nigel Titley
Is anyone else getting problems with the facebook web crawler hammering their OPAC search function? This has been happening on and off for a couple of months but set in with a vengeance a couple of days ago. The crawler is hitting us with many OPAC search queries, beyond the capacity of our sy

Re: [Koha] Problems with the facebook web crawler

2024-07-25 Thread Michael Kuhn
Hi Nigel In such a case I would advise to create a sitemap - unfortunately this Koha feature seems not so well documented, but the following may give you a start: * https://lists.katipo.co.nz/public/koha/2020-November/055401.html * https://wiki.koha-community.org/wiki/Commands_provided_by_t

Re: [Koha] Problems with the facebook web crawler

2024-07-25 Thread Nigel Titley
Dear Michael On 25/07/2024 13:28, Michael Kuhn wrote: Hi Nigel In such a case I would advise to create a sitemap - unfortunately this Koha feature seems not so well documented, but the following may give you a start: * https://lists.katipo.co.nz/public/koha/2020-November/055401.html * htt

Re: [Koha] Problems with the facebook web crawler

2024-07-25 Thread Jason Boyer
While they do ignore robots.txt they do at least supply a recognizable user agent that you can just block: RewriteEngine on RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit|other|bots|here" RewriteCond %{REQUEST_URI} "!403\.pl" [NC] RewriteRule "^.*" "-" [F] Note that second RewriteCond is r

Re: [Koha] Problems with the facebook web crawler

2024-07-25 Thread Nigel Titley
On 25/07/2024 13:55, Jason Boyer wrote: While they do ignore robots.txt they do at least supply a recognizable user agent that you can just block: RewriteEngine on RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit|other|bots|here" RewriteCond %{REQUEST_URI} "!403\.pl" [NC] RewriteRule "^.*"

Re: [Koha] Problems with the facebook web crawler

2024-07-25 Thread Coehoorn, Joel
We've had a couple recent crashes I haven't yet had time to dig into. This would explain it :/ And as I look now, I also see a bunch of AmazonBot, but I haven't yet checked whether this would at least respect robots.txt The really annoying thing about this is the catalog is there to be public. It'

Re: [Koha] Problems with the facebook web crawler

2024-07-25 Thread Chris Brown
Hi Nigel et al, I recently noticed the load on our Koha server was getting ridiculously high and investigation showed that most of it was bot requests for opan-search.pl (averaging about one a second!) I have managed to stop the ones that were hurting us with robots.txt and I am fairly confident

Re: [Koha] Problems with the facebook web crawler

2024-07-25 Thread Indranil Das Gupta
Hi Nigel, My solution for that is simple two step process: 1) using mod_sec to monitor and match the UA string of the incoming request against a list of UAs I don't want and return a HTTP 406 if the UA matches for the first time. 2) Have fail2ban monitor the apache log for 406 and immediately ba

Re: [Koha] Problems with the facebook web crawler

2024-07-25 Thread Nigel Titley
On 25/07/2024 13:55, Jason Boyer wrote: While they do ignore robots.txt they do at least supply a recognizable user agent that you can just block: RewriteEngine on RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit|other|bots|here" RewriteCond %{REQUEST_URI} "!403\.pl" [NC] RewriteRule "^.*"

Re: [Koha] Problems with the facebook web crawler

2024-07-25 Thread Nigel Titley
On 25/07/2024 23:31, Nigel Titley wrote: On 25/07/2024 13:55, Jason Boyer wrote: While they do ignore robots.txt they do at least supply a recognizable user agent that you can just block: RewriteEngine on RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit|other|bots|here" RewriteCond %{RE

Re: [Koha] Problems with the facebook web crawler

2024-07-25 Thread Hector Gonzalez Jaime
You might skip mod_sec and do the detection with fail2ban's apache-badbots, by changing its regex to  (the spaces ARE important, copy and paste that): failregex = ^(?:\S+:\d+ )? [^"]*"[A-Z]+ [^"]+" \d+ \d+ "[^"]*" "[^"]*(?:|)[^"]*" adding the bad bots to the start of the "badbots" regex like

[Koha] New Books List

2024-07-25 Thread Kazım ŞENTÜRK
Hello, I have created a SQL Query about the newly entered books. I have a list called New Books in the Lists module. In order to keep the list up to date, I run the SQL Query and save the list to my computer, get the barcode numbers and add them to the New Books list in the User lists. These oper