Thanks for reply.

Our solr node normally uses 30-45 gb and hence we allocated 60 heap size.  We 
analyzed heap dump and found that around 85% heap was used by 
org.apache.solr.uninverting.FieldCacheImpl.
--------------------
One instance of
"org.apache.solr.uninverting.FieldCacheImpl" loaded by 
"org.eclipse.jetty.webapp.WebAppClassLoader @ 0x48fe5e9b0" occupies 
19,72,04,15,160 (86.28%) bytes. The memory is accumulated in one instance of 
"java.util.HashMap$Node[]" loaded by "<system class loader>".
--------------------

Please note we are not using any solr cache as in our scenario new documents 
added to index quite fast (at least 10 documents are added to index every 
second) and we need to open searcher again to make this new documents available.

We are not using docValues. As per our understanding using docValues to should 
improve query performance and should also reduce memory requirement as we are 
using lots of sorting/faceting in our queries. Please let me know your thoughts 
on it. Please also suggest if there are any other way to reduce to memory 
requirement/optimize the performance.


Regards,

Maulin

-----Original Message-----
From: Shawn Heisey <[email protected]>
Sent: 14 May 2019 01:04
To: [email protected]
Subject: Re: Solr node goes into recovery mode

On 5/13/2019 8:26 AM, Maulin Rathod wrote:
> Recently we are observing issue where solr node (any random node) 
> automatically goes into recovery mode and stops responding.

Do you KNOW that these Solr instances actually need a 60GB heap?  That's a HUGE 
heap.  When a full GC happens on a heap that large, it's going to be a long 
pause, and there's nothing that can be done about it.

> We have enough memory allocated to Solr (60 gb) and system also have enough 
> memory (300 gb)...

As just mentioned, unless you are CERTAIN that you need a 60GB heap, which most 
users do not, don't set it that high.  Any advice you read that says "set the 
heap to XX percent of the installed system memory"
will frequently result in a setting that's incorrect for your specific setup.

And if you really DO need a 60GB heap, it would be recommended to either add 
more servers and put less of your index on each one, or to split your replicas 
between two Solr instances each running 31GB or less -- as Erick mentioned in 
his reply.

> We have analyzed GC logs and found that there was GC pause time of 29.6583943 
> second when problem happened. Can this GC Pause lead to make the node 
> unavailable/recovery mode? or there could be some another reason ?

> Please note we have set zkClientTimeout to 10 minutes 
> (zkClientTimeout=600000) so that zookeeper will not consider this node 
> unavailable during high GC pause time.

You can't actually set that timeout that high.  I believe that ZooKeeper limits 
the session timeout to 20 times the tickTime, which is typically set to 2 
seconds.  So 40 seconds is typically the maximum you can have for that timeout. 
 Solr's zkClientTimeout value is used to set ZooKeeper's session timeout.

And, as Erick also mentioned, there are other ways that a long GC pause can 
cause problems other than that specific timeout.  SolrCloud is not going to 
work well with a huge heap ... eventually a full GC is going to happen, and if 
it takes more than a few seconds, it's going to cause issues.

Thanks,
Shawn

[CC Award Winners!]

Reply via email to