Screen shots attached.

No GC logs, there is also a GC_TUNE_LOG variable which is set to blank. I've removed the GC_TUNE and GC_TUNE_LOG variables from my nodes 3 & 4, but will need to wait for nodes 1 and 2, where the reindexing is being sent until that has hopefully completed (90% complete). (I changed the solrconfig.xml for the collection to increase the buffers and told solr to reload the collection).

As this whold process will need to be repeated immediately for another large collection after this one, will hopefully be able to fix the configs on the two in use nodes by tomorrow sometime.

On 11/4/21 09:40, Shawn Heisey wrote:
On 11/4/21 6:27 AM, Michael Conrad wrote:
When there is a high segment count processes on the systems start showing very high "i/o wait" with associated idle CPU time according to top. Which indicates to me that CPU core count isn't a main culprit.

SOLR_JAVA_MEM="-Xms1g -Xmx5g"
GC_TUNE=""

Don't set GC_TUNE this way.  I just tried it, and it completely disables all GC tuning, at least on Solr 8.10.1.  It does NOT set the GC tuning to Solr defaults, it sets it to Java defaults, and Java defaults have always been terrible for Solr.  If you want Solr's standard GC tuning, which is actually pretty good, remove GC_TUNE from your solr.in.sh file.  For the Java version you're running, I would recommend either Shenandoah or ZGC ... although G1GC is quite good, if it is tuned further than just turning it on.  I do not remember which version of Solr changed its defaults from CMS to G1.  Although I am not a GC expert, I have done some experimenting with different GC options.

I'm betting that the performance problems aren't due to a high segment count.  I think it's more likely that they are due to memory issues.  I don't have enough information yet to determine which of the two memory-related problems I mentioned you're running into.

Can you share solr_gc.log generated during a time when the iowait gets bad?  Be aware that restarting Solr will rotate that log and it will have a number at the end of the filename after that.  You could share all of the GC logs and indicate which one has the right data in it.  It will PROBABLY be the largest file.

There is also a screenshot that answers a whole bunch of questions about memory use on the server all at once.  How to gather the screenshot is discussed here:

https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-Askingforhelponamemory/performanceissue

For devs:

I would have expected an empty GC_TUNE to go with Solr defaults. I see this line in bin/solr (on 8.10.1):

  if [ -z ${GC_TUNE+x} ]; then

I think the +x part should not be there.  With it removed, the script interprets an empty string as undefined and uses the defaults, which I think is correct.  The +x appears in four places in the script.

Thanks,
Shawn


Reply via email to