Solr nodes crashing

Jon Morisi Fri, 25 Jun 2021 11:35:17 -0700

Hi everyone,
I'm running solr 7.4.0 and have a collection running on 4 nodes (2 shards, 
replication factor =2).  I'm experiencing an issue where random nodes will 
crash when I submit large batches to be indexed (>500,000 documents).  I've 
been successful in keeping things running if I keep an eye on it and restart 
nodes after they crash.  Sometimes I end up with a non-recoverable replicated 
shard which I fix by dropping the replica and re-adding.


I've also been successful, no crashing, if I batch inserts in sizes < 500,000 
documents, so that's my workaround for now.

I'm wondering if anyone can help point me in the right direction for 
troubleshooting this issue, so that I can send upwards of 100m documents at a 
time.

>From the logs, I have the following errors:
SolrException.java:148) - java.io.EOFException
org.apache.solr.update.ErrorReportingConcurrentUpdateSolrClient 
(StreamingSolrClients.java:147) - error

I did see this: 
https://solr.apache.org/guide/7_3/taking-solr-to-production.html#file-handles-and-processes-ulimit-settings

I'm running RHEL, does this look correctly configured?
ulimit -a
                                                                core file size  
        (blocks, -c) 0
                                                                data seg size   
        (kbytes, -d) unlimited
                                                                scheduling 
priority             (-e) 0
                                                                file size       
        (blocks, -f) unlimited
                                                                pending signals 
                (-i) 1544093
                                                                max locked 
memory       (kbytes, -l) 64
                                                                max memory size 
        (kbytes, -m) unlimited
                                                                open files      
                (-n) 1024
                                                                pipe size       
     (512 bytes, -p) 8
                                                                POSIX message 
queues     (bytes, -q) 819200
                                                                real-time 
priority              (-r) 0
                                                                stack size      
        (kbytes, -s) 8192
                                                                cpu time        
       (seconds, -t) unlimited
                                                                max user 
processes              (-u) 4096
                                                                virtual memory  
        (kbytes, -v) unlimited
                                                                file locks      
                (-x) unlimited

                                cat /proc/sys/fs/file-max
                                                                39208945

I was thinking of scheduling a job to log the output of cat 
/proc/sys/fs/file-nr every 5 minutes or so on my next attempt in an attempt to 
validate this setting is not an issue.

Any other ideas?

TIA,
Jon

Solr nodes crashing

Reply via email to