yes, it’s very unusual. On Solr 8.11 (and previous versions) with setup and size of data, reload takes just a few seconds.
We are having 6 shards spread across 96 replicas. Each replica is hosted on a dedicated EC2 instance, no more than one replica present on the same machine (in k8s words, it's one pod per node, one solr replica per pod). I am able to reproduce the reload issue on Solr 9.0 and 9.1. Tried to isolate the underlying node along with the Solr Pod, couldn’t identify and issues like high load, iowait, whatsoever. Only issues I see are exceptions in the Solr logs that never recover unless Pods are restarted (we use empty dir and not persistent value, every time a pod is restarted, the cores that were hosted on it are removed and when the pod comes back it gets allocated to the same shard or another shard, depending on how many replicas other shards have, we try to keep balance in # of replicas) I agree that 23GB of heap is a bit too much and are doing some work to optimize it (resizing caches, etc.). We tried to lower the heap to 20GB already and GC performance is better, and in general Solr performs better. I must mention that we have the same heap size in Solr 8.11 and it doesn’t cause any issues with the reload. Could it have an impact on Solr 9, somehow? Thank you a lot for sharing your thoughts, especially for explaining GC params and sharing yours, very much appreciated. Do you have any ideas on what we should try more? Here is the digest of what we have tried, but without any success: Zookeeper: Upgrade from 3.6 to 3.7 and 3.8: no impact; DNS: Solr pods joining and communicating over Pod IP instead of Pod Svc DNS name (headless). This was done in order to avoid any potential issues (even though CoreDNS/nodelocaldns metrics looked Ok) with DNS resolvers; no impact; Lucene: upgrade to Lucene 9.1.0, 9.2.0, 9.3.0; no impact; Solr Nodes: version: tried Solr 9.0.0 and Solr 9.1.0 Result: no difference; Heap: recalculate the Heap size; reduce the size by 3GB (15%) in combination with caches resize (see below); Result: better performance, no old GC is triggered; cluster more stable; reload still timing out; TLOG and PULL: tested with 3TLOG replicas per shard, the rest 12 replicas PULL; tested all 15 replicas per shard of type TLOG; NRT is not an option at all, didn’t even try to test; Result: better response time with PULL, no impact on reload; Other tunings, including gc; no impact; solrconfig.xml: directoryFactory: https://solr.apache.org/docs/9_1_0/core/org/apache/solr/core/MMapDirectoryFactory.html throwing the following exception: o.a.s.c.SolrCore java.lang.IllegalArgumentException: Unknown directory: MMapDirectory@/var/solr/data/my_collection_shard3_replica_t1643/data/snapshot_metadata (we do not use snapshots at all) (stack trace https://justpaste.it/88en6) Switched to https://solr.apache.org/docs/9_1_0/core/org/apache/solr/core/StandardDirectoryFactory.html; problem solved, no more Unknown directory exceptions Reload won’t fail on some nodes with Unknown directory exception; Result: reload still timing out, fewer exceptions; lockType: Switched between “native” and “simple” lock type; Result: no impact; HttpShardHandlerFactory Increased timeout by 40% for cross-shards communication while doing queries; Result: no impact; filterCache,queryResultCache and documentCache: Limit the size of the caches to Megabyte instead of entries: filterCache - 1024MB queryResultCache - 1024MB documentCache - 2048MB Result: nodes are more stable during reload, cluster is not destabilizing, no old GC activity; better response time, less pressure on GC; circuitBreaker: disabling circuitBreaker; Result: no impact; --- Nick Vladiceanu vladicean...@gmail.com > On 20. Dec 2022, at 15:58, Shawn Heisey <apa...@elyograg.org> wrote: > > On 12/20/22 06:34, Nick Vladiceanu wrote: >> Thank you Shawn for sharing, indeed useful information. >> However, I must say that we only used deleteById and never deleteByQuery. We >> also only rely on the auto segment merging and not issuing optimize command. > > That is very unusual. I've never seen a core reload take more than a few > seconds, even when I was dealing with core sizes of double-digit GB. Unless > you have hundreds or thousands of replicas for each of your 6 shards, it > really should complete very quickly. > > Have you been able to determine which Solr cores in the collection are > causing the delay, and take a look at those machines? > > Some thoughts: > > When you said 96 nodes, were you talking about Solr instances or servers? > You really should only run one Solr instance per server, especially for a > small index like this. > > A 23GB heap seems very excessive for a 4.7GB index that has less than 4 > million documents. I'm sure you can reduce that by a lot and encounter > smaller GC pauses as a result. If you can share your GC logs, I should be > able to provide a recommendation. > > I've been looking at what MinHeapFreeRatio and MaxHeapFreeRatio do. Those > settings are probably unnecessary. This is what I currently use for GC > tuning on JDK 11 or JDK 17. This produces EXTREMELY short collection pauses, > but I have noticed that throughput-heavy things like indexing run a bit > slower, but if the indexing is multi-threaded, I think that it would not be > affected a lot. > > GC_TUNE=" \ > -XX:+UnlockExperimentalVMOptions \ > -XX:+UseZGC \ > -XX:+ParallelRefProcEnabled \ > -XX:+ExplicitGCInvokesConcurrent \ > -XX:+UseStringDeduplication \ > -XX:+AlwaysPreTouch \ > -XX:+UseNUMA \ > " > > ZGC has one unexpected disadvantage. Using it will disable Compressed OOPs > -- meaning that even with a heap smaller than 32GB, it uses 64 bit pointers. > This hasn't really impacted me ... the index is so small that with a 1GB heap > I have more than enough. If low pauses are the most important thing you need > from GC and you're running at least JDK11, I would strongly recommend ZGC. > It does make indexing slower for me -- a full rebuild that takes 10 minutes > with G1 takes 11 minutes with ZGC. But even the worst-case GC pauses are > single-digit milliseconds. > > For G1GC, which is still the best option for JDK8, this is what I used to > have: > > #GC_TUNE=" \ > # -XX:+UseG1GC \ > # -XX:+ParallelRefProcEnabled \ > # -XX:MaxGCPauseMillis=100 \ > # -XX:+ExplicitGCInvokesConcurrent \ > # -XX:+UseStringDeduplication \ > # -XX:+AlwaysPreTouch \ > # -XX:+UseNUMA \ > #" > > Thanks, > Shawn