Thanks Shawn. Most people I talked to sort of acknowledged that restarting SOLR 
every month or so is a given, but your comments are encouraging.

These nodes have 32Gb of ram:

real memory  = 34359738368 (32768 MB)
avail memory = 33370628096 (31824 MB)

and here's what I have in my solr config

SOLR_JAVA_MEM="-Xms14512m -Xmx16512m"

I thought I'd keep it at half of available RAM but it still goes into swap...

Thank you for your help

-----Original Message-----
From: Shawn Heisey <apa...@elyograg.org> 
Sent: Saturday, December 11, 2021 1:25 AM
To: users@solr.apache.org
Subject: Re: Solr Cloud Node re-join issue

On 12/10/2021 12:38 PM, Scott wrote:
> Having a bit of  weird issue.
> 
> We run a 4 node Solr Cloud , version 8.6.2 and for the most part it's 
> been going quite well for more than 2 years now. We have to restart 
> them occasionally to free up ram but I guess that's normal.

If you have to restart because it's using too much memory, then something is 
not configured right.  If the java heap is sized appropriately, and the machine 
is not being used to handle software other than Solr, it is pretty much 
impossible for a java program like Solr to take too much memory.

> Last night one of the nodes went into swap, used up all memory and crashed.
> Somehow the way it crashed, it also removed all local cores/data. The 
> cluster kept on chugging along which was fine, but now I can't get the 
> crashed node to resync with the others.

Assuming again that Solr is the only significant memory-using process on the 
system, and the heap is sized appropriately, then that system should NEVER use 
significant amounts of swap.

I'm betting that you have configured Solr with a max heap size that's too large 
for the system it's running on.  Because Java uses a garbage collection memory 
model, almost any Java program will eventually use the entire max heap size it 
has been given, even if it does not actually need that much memory.  This is 
expected.

The most likely reason for a SolrCloud node to delete all cores is that it 
connects to a zookeeper ensemble that does not contain SolrCloud cluster config 
data, or contains a cluster config that's not the correct one.  See this issue:

https://issues.apache.org/jira/browse/SOLR-13396

Thanks,
Shawn



This is a private message

Reply via email to