Hello,
I've observed a concerning issue with our Koha server, where multiple bots are causing downtime and significantly increasing CPU usage. Some of the problematic bots include:

 * PetalBot;+https://webmaster.petalsearch.com/site/petalbot
 * MJ12bot/v1.4.8; http://mj12bot.com/
 * SemrushBot/7~bl; +http://www.semrush.com/bot.html

Despite attempting to address this by adding a robots.txt file, it hasn't proven effective in preventing these bots from causing disruptions. Additionally, the dynamic nature of IP addresses makes it challenging to block them individually.

Furthermore, I've noticed that the Apache2 server is generating internal requests, and I'm uncertain about the cause and purpose of these requests. ` ::1 - - [15/Jan/2024:12:40:41 +0530] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.4.41 (Ubuntu) OpenSSL/1.1.1f (internal dummy connection)" `


I need your expertise to fix the bot issues impacting server performance, high CPU usage,  and prevent unauthorized internal requests.

Thanks and Regards,
Amar Londhe
Full-Stack Developer

On 11/01/24 4:30 am, koha-requ...@lists.katipo.co.nz wrote:
Send Koha mailing list submissions to
        koha@lists.katipo.co.nz

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.katipo.co.nz/mailman/listinfo/koha
or, via email, send a message with subject or body 'help' to
        koha-requ...@lists.katipo.co.nz

You can reach the person managing the list at
        koha-ow...@lists.katipo.co.nz

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Koha digest..."


Today's Topics:

    1. how to avoid high cpu uses due to web crawlers (vinod mishra)
    2. Re: how to avoid high cpu uses due to web crawlers
       (Nirmit Krishnatray)
    3. Re: how to avoid high cpu uses due to web crawlers (vinod mishra)
    4. Re: how to avoid high cpu uses due to web crawlers
       (Wagner, Alexander)
    5. koha-US Board Meeting Minutes for January 10, 2024
       (Kristi Krueger)


----------------------------------------------------------------------

Message: 1
Date: Wed, 10 Jan 2024 12:51:38 +0530
From: vinod mishra<mishrav...@gmail.com>
To: Koha<Koha@lists.katipo.co.nz>
Subject: [Koha] how to avoid high cpu uses due to web crawlers
Message-ID:
        <cagluwirdash66xeqoznjhxxeigiigvdtv8p7w5uihm3h93m...@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

Hello

I found that an IP 47.76.35.19 is hitting my opac continuously, due to
which CPU use is very high, and it makes the entire Koha opac and staff
client very slow.

I tried following the links but could not resolve the issue.

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042#c3
https://wiki.koha-community.org/wiki/Koha_Tuning_Guide

I am also not able to locate the file .htaccess in mu ubuntu 18.04 with
koha 20.04
Can anyone how to resolve this?

With Regards,

Vinod Kumar Mishra,
(Ph.D, MLISC, MA, B.Sc, DCA)
Assistant Librarian,
Biju Patnaik Central Library (BPCL),
NIT Rourkela,
Sundergadh-769008,
Odisha,
India.
Mob:91+9439420860
URL:https://vinod.itshelp.co.in/  <http://vinod.itshelp.co.in/>
ORCID ID:https://orcid.org/0000-0003-4666-7874
<http://orcid.org/0000-0003-4666-7874>
Scopus ID: 57223138343

*"Spiritual relationship is far more precious than physical. Physical
relationship divorced from spiritual is body without soul" -- Mahatma
Gandhi*


------------------------------

Message: 2
Date: Wed, 10 Jan 2024 07:46:17 +0000
From: Nirmit Krishnatray<nir...@edutech.com>
To: vinod mishra<mishrav...@gmail.com>, Koha
        <Koha@lists.katipo.co.nz>
Subject: Re: [Koha] how to avoid high cpu uses due to web crawlers
Message-ID:<27ebb0c12a164436a0b59c8be7e46...@edutech.com>
Content-Type: text/plain; charset="utf-8"

Hi  sir,

Try to block the ip that is hitting on your server.

Best Regards
Nirmit Krishnatray | Associate Manager - Professional Services
DBS Business Center, World Trade Tower, Barakhamba Lane,Connaught Place,
New Delhi – 110001
M: +91 9003078515 | E:nir...@edutech.com
Edutech India  | LinkedIn  | Twitter  |  Facebook  |  Youtube


-----Original Message-----
From: Koha [mailto:koha-boun...@lists.katipo.co.nz] On Behalf Of vinod mishra
Sent: 10 January 2024 12:52
To: Koha<Koha@lists.katipo.co.nz>
Subject: [Koha] how to avoid high cpu uses due to web crawlers

Hello

I found that an IP 47.76.35.19 is hitting my opac continuously, due to which 
CPU use is very high, and it makes the entire Koha opac and staff client very 
slow.

I tried following the links but could not resolve the issue.

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042#c3
https://wiki.koha-community.org/wiki/Koha_Tuning_Guide

I am also not able to locate the file .htaccess in mu ubuntu 18.04 with koha 
20.04 Can anyone how to resolve this?

With Regards,

Vinod Kumar Mishra,
(Ph.D, MLISC, MA, B.Sc, DCA)
Assistant Librarian,
Biju Patnaik Central Library (BPCL),
NIT Rourkela,
Sundergadh-769008,
Odisha,
India.
Mob:91+9439420860
URL:https://vinod.itshelp.co.in/  <http://vinod.itshelp.co.in/>  ORCID 
ID:https://orcid.org/0000-0003-4666-7874
<http://orcid.org/0000-0003-4666-7874>
Scopus ID: 57223138343

*"Spiritual relationship is far more precious than physical. Physical relationship 
divorced from spiritual is body without soul" -- Mahatma
Gandhi*
_______________________________________________

Koha mailing listhttp://koha-community.org  Koha@lists.katipo.co.nz
Unsubscribe:https://lists.katipo.co.nz/mailman/listinfo/koha

------------------------------

Message: 3
Date: Wed, 10 Jan 2024 13:23:31 +0530
From: vinod mishra<mishrav...@gmail.com>
To: Nirmit Krishnatray<nir...@edutech.com>
Cc: Koha<Koha@lists.katipo.co.nz>
Subject: Re: [Koha] how to avoid high cpu uses due to web crawlers
Message-ID:
        <cagluwitpogzfrz84vax5bzvsa7ejz+rivfmtdevuy1c-qtg...@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

Thanks that is the ultimate solution, looking for any other effective
solution too if faced in future. Creating robots.txt file seems easy but
finding which crawler is hitting is difficult with IP

On Wed, 10 Jan, 2024, 13:16 Nirmit Krishnatray,<nir...@edutech.com>  wrote:

Hi  sir,

Try to block the ip that is hitting on your server.

Best Regards
Nirmit Krishnatray | Associate Manager - Professional Services
DBS Business Center, World Trade Tower, Barakhamba Lane,Connaught Place,
New Delhi – 110001
M: +91 9003078515 | E:nir...@edutech.com
Edutech India  | LinkedIn  | Twitter  |  Facebook  |  Youtube


-----Original Message-----
From: Koha [mailto:koha-boun...@lists.katipo.co.nz] On Behalf Of vinod
mishra
Sent: 10 January 2024 12:52
To: Koha<Koha@lists.katipo.co.nz>
Subject: [Koha] how to avoid high cpu uses due to web crawlers

Hello

I found that an IP 47.76.35.19 is hitting my opac continuously, due to
which CPU use is very high, and it makes the entire Koha opac and staff
client very slow.

I tried following the links but could not resolve the issue.

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042#c3
https://wiki.koha-community.org/wiki/Koha_Tuning_Guide

I am also not able to locate the file .htaccess in mu ubuntu 18.04 with
koha 20.04 Can anyone how to resolve this?

With Regards,

Vinod Kumar Mishra,
(Ph.D, MLISC, MA, B.Sc, DCA)
Assistant Librarian,
Biju Patnaik Central Library (BPCL),
NIT Rourkela,
Sundergadh-769008,
Odisha,
India.
Mob:91+9439420860
URL:https://vinod.itshelp.co.in/  <http://vinod.itshelp.co.in/>  ORCID ID:
https://orcid.org/0000-0003-4666-7874
<http://orcid.org/0000-0003-4666-7874>
Scopus ID: 57223138343

*"Spiritual relationship is far more precious than physical. Physical
relationship divorced from spiritual is body without soul" -- Mahatma
Gandhi*
_______________________________________________

Koha mailing listhttp://koha-community.org  Koha@lists.katipo.co.nz
Unsubscribe:https://lists.katipo.co.nz/mailman/listinfo/koha


------------------------------

Message: 4
Date: Wed, 10 Jan 2024 10:19:55 +0100 (CET)
From: "Wagner, Alexander"<alexander.wag...@desy.de>
To: vinod mishra<mishrav...@gmail.com>
Cc: Koha<Koha@lists.katipo.co.nz>
Subject: Re: [Koha] how to avoid high cpu uses due to web crawlers
Message-ID:<2002597982.8686470.1704878395276.javamail.zim...@desy.de>
Content-Type: text/plain; charset=utf-8

Hi!

I found that an IP 47.76.35.19 is hitting my opac continuously, due to
which CPU use is very high, and it makes the entire Koha opac and staff
client very slow.
This does not look like a legit crawler. So most likely you can't tackle this 
guy with a robots.txt as most likely it will not respect it anyway.

I am also not able to locate the file .htaccess in mu ubuntu 18.04 with
koha 20.04
Can anyone how to resolve this?
`.htaccess` files do not exist by default, you'd have to create it in the 
appropriate place with proper permissions and ownerships using your favourite 
text-editor. They are basically folder based firewall rules read by your 
webserver. IOW you could either use those or have a rule in your apache configs.

I am no expert in either but on one of our current (non-koha)-systems we use 
something like

```

# Turn badips away
RewriteMap hosts-deny "txt:/opt/invenio/var/tmp/hosts-deny.txt"
RewriteCond   "${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}" "!=NOT-FOUND" [OR]
RewriteCond   "${hosts-deny:%{HTTP:X-Forwarded-For}|NOT-FOUND}" "!=NOT-FOUND"
RewriteRule .* - [R=429,L]

```

in the apache configs. This refers to a txt-file in this case in some funny path 
`/opt/invenio/var/tmp/` called `hosts-deny.txt` that lists the ip-addresses that should 
be dropped. You could in principle create such a file in some place your apache can see 
it. This makes it a bit easier to handle unwanted "crawlers" as you just add 
the offending ips there.

HTH.

_______________________________________________

Koha mailing list  http://koha-community.org
Koha@lists.katipo.co.nz
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha

Reply via email to