On 12/10/2021 12:38 PM, Scott wrote:
Having a bit of weird issue.
We run a 4 node Solr Cloud , version 8.6.2 and for the most part it's been
going quite well for more than 2 years now. We have to restart them
occasionally to free up ram but I guess that's normal.
If you have to restart because it's using too much memory, then
something is not configured right. If the java heap is sized
appropriately, and the machine is not being used to handle software
other than Solr, it is pretty much impossible for a java program like
Solr to take too much memory.
Last night one of the nodes went into swap, used up all memory and crashed.
Somehow the way it crashed, it also removed all local cores/data. The
cluster kept on chugging along which was fine, but now I can't get the
crashed node to resync with the others.
Assuming again that Solr is the only significant memory-using process on
the system, and the heap is sized appropriately, then that system should
NEVER use significant amounts of swap.
I'm betting that you have configured Solr with a max heap size that's
too large for the system it's running on. Because Java uses a garbage
collection memory model, almost any Java program will eventually use the
entire max heap size it has been given, even if it does not actually
need that much memory. This is expected.
The most likely reason for a SolrCloud node to delete all cores is that
it connects to a zookeeper ensemble that does not contain SolrCloud
cluster config data, or contains a cluster config that's not the correct
one. See this issue:
https://issues.apache.org/jira/browse/SOLR-13396
Thanks,
Shawn