Send Koha mailing list submissions to
koha@lists.katipo.co.nz
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.katipo.co.nz/mailman/listinfo/koha
or, via email, send a message with subject or body 'help' to
koha-requ...@lists.katipo.co.nz
You can reach the person managing the list at
koha-ow...@lists.katipo.co.nz
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Koha digest..."
Today's Topics:
1. how to avoid high cpu uses due to web crawlers (vinod mishra)
2. Re: how to avoid high cpu uses due to web crawlers
(Nirmit Krishnatray)
3. Re: how to avoid high cpu uses due to web crawlers (vinod mishra)
4. Re: how to avoid high cpu uses due to web crawlers
(Wagner, Alexander)
5. koha-US Board Meeting Minutes for January 10, 2024
(Kristi Krueger)
----------------------------------------------------------------------
Message: 1
Date: Wed, 10 Jan 2024 12:51:38 +0530
From: vinod mishra<mishrav...@gmail.com>
To: Koha<Koha@lists.katipo.co.nz>
Subject: [Koha] how to avoid high cpu uses due to web crawlers
Message-ID:
<cagluwirdash66xeqoznjhxxeigiigvdtv8p7w5uihm3h93m...@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Hello
I found that an IP 47.76.35.19 is hitting my opac continuously, due to
which CPU use is very high, and it makes the entire Koha opac and staff
client very slow.
I tried following the links but could not resolve the issue.
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042#c3
https://wiki.koha-community.org/wiki/Koha_Tuning_Guide
I am also not able to locate the file .htaccess in mu ubuntu 18.04 with
koha 20.04
Can anyone how to resolve this?
With Regards,
Vinod Kumar Mishra,
(Ph.D, MLISC, MA, B.Sc, DCA)
Assistant Librarian,
Biju Patnaik Central Library (BPCL),
NIT Rourkela,
Sundergadh-769008,
Odisha,
India.
Mob:91+9439420860
URL:https://vinod.itshelp.co.in/ <http://vinod.itshelp.co.in/>
ORCID ID:https://orcid.org/0000-0003-4666-7874
<http://orcid.org/0000-0003-4666-7874>
Scopus ID: 57223138343
*"Spiritual relationship is far more precious than physical. Physical
relationship divorced from spiritual is body without soul" -- Mahatma
Gandhi*
------------------------------
Message: 2
Date: Wed, 10 Jan 2024 07:46:17 +0000
From: Nirmit Krishnatray<nir...@edutech.com>
To: vinod mishra<mishrav...@gmail.com>, Koha
<Koha@lists.katipo.co.nz>
Subject: Re: [Koha] how to avoid high cpu uses due to web crawlers
Message-ID:<27ebb0c12a164436a0b59c8be7e46...@edutech.com>
Content-Type: text/plain; charset="utf-8"
Hi sir,
Try to block the ip that is hitting on your server.
Best Regards
Nirmit Krishnatray | Associate Manager - Professional Services
DBS Business Center, World Trade Tower, Barakhamba Lane,Connaught Place,
New Delhi – 110001
M: +91 9003078515 | E:nir...@edutech.com
Edutech India | LinkedIn | Twitter | Facebook | Youtube
-----Original Message-----
From: Koha [mailto:koha-boun...@lists.katipo.co.nz] On Behalf Of vinod mishra
Sent: 10 January 2024 12:52
To: Koha<Koha@lists.katipo.co.nz>
Subject: [Koha] how to avoid high cpu uses due to web crawlers
Hello
I found that an IP 47.76.35.19 is hitting my opac continuously, due to which
CPU use is very high, and it makes the entire Koha opac and staff client very
slow.
I tried following the links but could not resolve the issue.
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042#c3
https://wiki.koha-community.org/wiki/Koha_Tuning_Guide
I am also not able to locate the file .htaccess in mu ubuntu 18.04 with koha
20.04 Can anyone how to resolve this?
With Regards,
Vinod Kumar Mishra,
(Ph.D, MLISC, MA, B.Sc, DCA)
Assistant Librarian,
Biju Patnaik Central Library (BPCL),
NIT Rourkela,
Sundergadh-769008,
Odisha,
India.
Mob:91+9439420860
URL:https://vinod.itshelp.co.in/ <http://vinod.itshelp.co.in/> ORCID
ID:https://orcid.org/0000-0003-4666-7874
<http://orcid.org/0000-0003-4666-7874>
Scopus ID: 57223138343
*"Spiritual relationship is far more precious than physical. Physical relationship
divorced from spiritual is body without soul" -- Mahatma
Gandhi*
_______________________________________________
Koha mailing listhttp://koha-community.org Koha@lists.katipo.co.nz
Unsubscribe:https://lists.katipo.co.nz/mailman/listinfo/koha
------------------------------
Message: 3
Date: Wed, 10 Jan 2024 13:23:31 +0530
From: vinod mishra<mishrav...@gmail.com>
To: Nirmit Krishnatray<nir...@edutech.com>
Cc: Koha<Koha@lists.katipo.co.nz>
Subject: Re: [Koha] how to avoid high cpu uses due to web crawlers
Message-ID:
<cagluwitpogzfrz84vax5bzvsa7ejz+rivfmtdevuy1c-qtg...@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Thanks that is the ultimate solution, looking for any other effective
solution too if faced in future. Creating robots.txt file seems easy but
finding which crawler is hitting is difficult with IP
On Wed, 10 Jan, 2024, 13:16 Nirmit Krishnatray,<nir...@edutech.com> wrote:
Hi sir,
Try to block the ip that is hitting on your server.
Best Regards
Nirmit Krishnatray | Associate Manager - Professional Services
DBS Business Center, World Trade Tower, Barakhamba Lane,Connaught Place,
New Delhi – 110001
M: +91 9003078515 | E:nir...@edutech.com
Edutech India | LinkedIn | Twitter | Facebook | Youtube
-----Original Message-----
From: Koha [mailto:koha-boun...@lists.katipo.co.nz] On Behalf Of vinod
mishra
Sent: 10 January 2024 12:52
To: Koha<Koha@lists.katipo.co.nz>
Subject: [Koha] how to avoid high cpu uses due to web crawlers
Hello
I found that an IP 47.76.35.19 is hitting my opac continuously, due to
which CPU use is very high, and it makes the entire Koha opac and staff
client very slow.
I tried following the links but could not resolve the issue.
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042#c3
https://wiki.koha-community.org/wiki/Koha_Tuning_Guide
I am also not able to locate the file .htaccess in mu ubuntu 18.04 with
koha 20.04 Can anyone how to resolve this?
With Regards,
Vinod Kumar Mishra,
(Ph.D, MLISC, MA, B.Sc, DCA)
Assistant Librarian,
Biju Patnaik Central Library (BPCL),
NIT Rourkela,
Sundergadh-769008,
Odisha,
India.
Mob:91+9439420860
URL:https://vinod.itshelp.co.in/ <http://vinod.itshelp.co.in/> ORCID ID:
https://orcid.org/0000-0003-4666-7874
<http://orcid.org/0000-0003-4666-7874>
Scopus ID: 57223138343
*"Spiritual relationship is far more precious than physical. Physical
relationship divorced from spiritual is body without soul" -- Mahatma
Gandhi*
_______________________________________________
Koha mailing listhttp://koha-community.org Koha@lists.katipo.co.nz
Unsubscribe:https://lists.katipo.co.nz/mailman/listinfo/koha
------------------------------
Message: 4
Date: Wed, 10 Jan 2024 10:19:55 +0100 (CET)
From: "Wagner, Alexander"<alexander.wag...@desy.de>
To: vinod mishra<mishrav...@gmail.com>
Cc: Koha<Koha@lists.katipo.co.nz>
Subject: Re: [Koha] how to avoid high cpu uses due to web crawlers
Message-ID:<2002597982.8686470.1704878395276.javamail.zim...@desy.de>
Content-Type: text/plain; charset=utf-8
Hi!
I found that an IP 47.76.35.19 is hitting my opac continuously, due to
which CPU use is very high, and it makes the entire Koha opac and staff
client very slow.
This does not look like a legit crawler. So most likely you can't tackle this
guy with a robots.txt as most likely it will not respect it anyway.
I am also not able to locate the file .htaccess in mu ubuntu 18.04 with
koha 20.04
Can anyone how to resolve this?
`.htaccess` files do not exist by default, you'd have to create it in the
appropriate place with proper permissions and ownerships using your favourite
text-editor. They are basically folder based firewall rules read by your
webserver. IOW you could either use those or have a rule in your apache configs.
I am no expert in either but on one of our current (non-koha)-systems we use
something like
```
# Turn badips away
RewriteMap hosts-deny "txt:/opt/invenio/var/tmp/hosts-deny.txt"
RewriteCond "${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}" "!=NOT-FOUND" [OR]
RewriteCond "${hosts-deny:%{HTTP:X-Forwarded-For}|NOT-FOUND}" "!=NOT-FOUND"
RewriteRule .* - [R=429,L]
```
in the apache configs. This refers to a txt-file in this case in some funny path
`/opt/invenio/var/tmp/` called `hosts-deny.txt` that lists the ip-addresses that should
be dropped. You could in principle create such a file in some place your apache can see
it. This makes it a bit easier to handle unwanted "crawlers" as you just add
the offending ips there.
HTH.