Screen shots attached.
No GC logs, there is also a GC_TUNE_LOG variable which is set to blank.
I've removed the GC_TUNE and GC_TUNE_LOG variables from my nodes 3 & 4,
but will need to wait for nodes 1 and 2, where the reindexing is being
sent until that has hopefully completed (90% complete). (I changed the
solrconfig.xml for the collection to increase the buffers and told solr
to reload the collection).
As this whold process will need to be repeated immediately for another
large collection after this one, will hopefully be able to fix the
configs on the two in use nodes by tomorrow sometime.
On 11/4/21 09:40, Shawn Heisey wrote:
On 11/4/21 6:27 AM, Michael Conrad wrote:
When there is a high segment count processes on the systems start
showing very high "i/o wait" with associated idle CPU time according
to top. Which indicates to me that CPU core count isn't a main culprit.
SOLR_JAVA_MEM="-Xms1g -Xmx5g"
GC_TUNE=""
Don't set GC_TUNE this way. I just tried it, and it completely
disables all GC tuning, at least on Solr 8.10.1. It does NOT set the
GC tuning to Solr defaults, it sets it to Java defaults, and Java
defaults have always been terrible for Solr. If you want Solr's
standard GC tuning, which is actually pretty good, remove GC_TUNE from
your solr.in.sh file. For the Java version you're running, I would
recommend either Shenandoah or ZGC ... although G1GC is quite good, if
it is tuned further than just turning it on. I do not remember which
version of Solr changed its defaults from CMS to G1. Although I am
not a GC expert, I have done some experimenting with different GC
options.
I'm betting that the performance problems aren't due to a high segment
count. I think it's more likely that they are due to memory issues.
I don't have enough information yet to determine which of the two
memory-related problems I mentioned you're running into.
Can you share solr_gc.log generated during a time when the iowait gets
bad? Be aware that restarting Solr will rotate that log and it will
have a number at the end of the filename after that. You could share
all of the GC logs and indicate which one has the right data in it.
It will PROBABLY be the largest file.
There is also a screenshot that answers a whole bunch of questions
about memory use on the server all at once. How to gather the
screenshot is discussed here:
https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-Askingforhelponamemory/performanceissue
For devs:
I would have expected an empty GC_TUNE to go with Solr defaults. I see
this line in bin/solr (on 8.10.1):
if [ -z ${GC_TUNE+x} ]; then
I think the +x part should not be there. With it removed, the script
interprets an empty string as undefined and uses the defaults, which I
think is correct. The +x appears in four places in the script.
Thanks,
Shawn