Thanks for reply. Our solr node normally uses 30-45 gb and hence we allocated 60 heap size. We analyzed heap dump and found that around 85% heap was used by org.apache.solr.uninverting.FieldCacheImpl. -------------------- One instance of "org.apache.solr.uninverting.FieldCacheImpl" loaded by "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x48fe5e9b0" occupies 19,72,04,15,160 (86.28%) bytes. The memory is accumulated in one instance of "java.util.HashMap$Node[]" loaded by "<system class loader>". --------------------
Please note we are not using any solr cache as in our scenario new documents added to index quite fast (at least 10 documents are added to index every second) and we need to open searcher again to make this new documents available. We are not using docValues. As per our understanding using docValues to should improve query performance and should also reduce memory requirement as we are using lots of sorting/faceting in our queries. Please let me know your thoughts on it. Please also suggest if there are any other way to reduce to memory requirement/optimize the performance. Regards, Maulin -----Original Message----- From: Shawn Heisey <[email protected]> Sent: 14 May 2019 01:04 To: [email protected] Subject: Re: Solr node goes into recovery mode On 5/13/2019 8:26 AM, Maulin Rathod wrote: > Recently we are observing issue where solr node (any random node) > automatically goes into recovery mode and stops responding. Do you KNOW that these Solr instances actually need a 60GB heap? That's a HUGE heap. When a full GC happens on a heap that large, it's going to be a long pause, and there's nothing that can be done about it. > We have enough memory allocated to Solr (60 gb) and system also have enough > memory (300 gb)... As just mentioned, unless you are CERTAIN that you need a 60GB heap, which most users do not, don't set it that high. Any advice you read that says "set the heap to XX percent of the installed system memory" will frequently result in a setting that's incorrect for your specific setup. And if you really DO need a 60GB heap, it would be recommended to either add more servers and put less of your index on each one, or to split your replicas between two Solr instances each running 31GB or less -- as Erick mentioned in his reply. > We have analyzed GC logs and found that there was GC pause time of 29.6583943 > second when problem happened. Can this GC Pause lead to make the node > unavailable/recovery mode? or there could be some another reason ? > Please note we have set zkClientTimeout to 10 minutes > (zkClientTimeout=600000) so that zookeeper will not consider this node > unavailable during high GC pause time. You can't actually set that timeout that high. I believe that ZooKeeper limits the session timeout to 20 times the tickTime, which is typically set to 2 seconds. So 40 seconds is typically the maximum you can have for that timeout. Solr's zkClientTimeout value is used to set ZooKeeper's session timeout. And, as Erick also mentioned, there are other ways that a long GC pause can cause problems other than that specific timeout. SolrCloud is not going to work well with a huge heap ... eventually a full GC is going to happen, and if it takes more than a few seconds, it's going to cause issues. Thanks, Shawn [CC Award Winners!]
