Re: Solr node goes into recovery mode

Shawn Heisey Mon, 13 May 2019 12:34:30 -0700

On 5/13/2019 8:26 AM, Maulin Rathod wrote:

Recently we are observing issue where solr node (any random node) automatically 
goes into recovery mode and stops responding.

Do you KNOW that these Solr instances actually need a 60GB heap? That'sa HUGE heap. When a full GC happens on a heap that large, it's going tobe a long pause, and there's nothing that can be done about it.

We have enough memory allocated to Solr (60 gb) and system also have enough 
memory (300 gb)...

As just mentioned, unless you are CERTAIN that you need a 60GB heap,which most users do not, don't set it that high. Any advice you readthat says "set the heap to XX percent of the installed system memory"will frequently result in a setting that's incorrect for your specificsetup.

And if you really DO need a 60GB heap, it would be recommended to eitheradd more servers and put less of your index on each one, or to splityour replicas between two Solr instances each running 31GB or less -- asErick mentioned in his reply.

We have analyzed GC logs and found that there was GC pause time of 29.6583943 
second when problem happened. Can this GC Pause lead to make the node 
unavailable/recovery mode? or there could be some another reason ?

Please note we have set zkClientTimeout to 10 minutes (zkClientTimeout=600000) 
so that zookeeper will not consider this node unavailable during high GC pause 
time.

You can't actually set that timeout that high. I believe that ZooKeeperlimits the session timeout to 20 times the tickTime, which is typicallyset to 2 seconds. So 40 seconds is typically the maximum you can havefor that timeout. Solr's zkClientTimeout value is used to setZooKeeper's session timeout.

And, as Erick also mentioned, there are other ways that a long GC pausecan cause problems other than that specific timeout. SolrCloud is notgoing to work well with a huge heap ... eventually a full GC is going tohappen, and if it takes more than a few seconds, it's going to cause issues.


Thanks,
Shawn

Re: Solr node goes into recovery mode

Reply via email to