+1 for fail2ban

@Dmitri Maziuk if your Solr is behind Apache httpd then you may be
interested in mod-evasive which worked well for XMLRPC attacks against
Wordpress.

You can combo it with fail2ban

https://ejectdisc.org/2015/08/08/admin-a-wordpress-site-running-on-debian-linux-learn-how-to-protect-it-from-dos-xmlrpc-attacks-and-similar/

It sounds like your Solr is publically exposed to the web. Yikes. An
alternative is to change the port that it's running on to something non
standard and random. These bots scan for well-known ports.

That's "security through obscurity" though and you should ideally be
running Solr behind some kind of "web application firewall".


On Thu, Jun 20, 2024, 4:56 PM Ohms, Jannis <j.o...@tu-braunschweig.de>
wrote:

> I Work in a library so yes we have a similar Problem our solr ist used
> inderect by a Webapplikationen running in another Server
>
> WE use https://wiki.archlinux.org/title/fail2ban to Block IPs which
> exceed a given number of requests per Minute
> ________________________________
> Von: Dmitri Maziuk <dmitri.maz...@gmail.com>
> Gesendet: Donnerstag, 20. Juni 2024 17:38:27
> An: users@solr.apache.org
> Betreff: are bots DoS'ing anyone else's Solr?
>
> Hi all,
>
> the latest mole in the eternal whack-a-mole game with web crawlers
> (GPTBot) DoS'ed our Solr again & I took a closer look at the logs.
> Here's what it looks like is happening:
>
> - the bot is hitting a URL backed by Solr search and starts following
> all permutations of facets and "next page"s at a rate of 60+ hits/second.
> - Solr is not returning the results fast enough and the bot is dropping
> connections.
> - An INFO message is logged: jetty is "unable to write response, client
> closed connection or we are shutting down" -- IOException on the
> OutputStream: Closed.
>
> These go on for a while until:
>
> java.nio.file.FileSystemException:
>
> $PATH_TO\server\solr\preview_shard1_replica_n2\data\tlog\buffer.tlog.0000800034318988100:
> The process cannot access the file because it is being used by another
> process.
>   -- Different file suffix # on every one of those
>
> And eventually an update comes in and fails with
>
> ERROR (qtp173791568-23140) [c:preview s:shard1 r:core_node4
> x:preview_shard1_replica_n2] o.a.s.h.RequestHandlerBase
> org.apache.solr.common.SolrException: Error logging add =>
> org.apache.solr.common.SolrException: Error logging add
> at
> org.apache.solr.update.TransactionLog.write(TransactionLog.java:420)
> org.apache.solr.common.SolrException: Error logging add
>
> Caused by: java.io.IOException: There is not enough space on the disk
> ...
>
> At this point Solr is hosed. Admin page shows "no collections available"
> but does respond to queries; all queries from the website client (.NET)
> are failing.
>
> This is Solr 8-11.2 on winders server 2022/correto JVM 11.
>
> So, questions: has anyone else seen this?
>
> Who is "buffer.tlog.xyz", do they have a size/# files cap, and are they
> not getting GC'ed fast enough under this kind of load?
>
> The 400GB disk is normally at ~90% empty, "not enough space on the disk"
> does not sound right. The logs do pile up when this happens and Java
> starts dumping gigabytes of stack traces, but they add up to few 100 MBs
> at most.  There certainly was *some* free space when I got to it, and
> it's back to 99% free after Solr restart.
>
> Any suggestions as to how to deal with this?
>
> (Obviously, I added "Disallow: /" to robots.txt for GPTBot, but that's
> only good until the next bot comes along.)
>
> TIA
> Dima
>
>

Reply via email to