Thanks Kevin for looking into it. I’ll answer the questions in the original order: * Pod volume has the correct permissions. Basically, we use emptyDir provisioned by the solr-operator. All the nodes are having exactly the same setup. No pods are co-located on the same worker node. No more than one Solr core is located on the same node. * We are actively indexing and querying. We do also use partial updates. Since we use TLOG replica types, we have a hard commit of 180s that opens a new searcher. * JDK 11 and JDK 17 behaves the same way. We were able to reproduce on both builds.
As per directory exceptions, I also cannot understand why it is throwing that Unknown Directory exception. I have logged in into a Solr pod that was throwing this error and was able to find the exact location on the disk existing. When reload fails, sometimes it might fail on one node, other times fail on multiple nodes at the same time. I was checking all the logs on the k8s node and on the pod but couldn’t find anything related to the disk, network, or other errors. I was wondering, could it be something wrong with the solrconfig.xml parameters? Perhaps, a combination of parameters does not behave stable? Do you think it makes sense to go with a vanilla solrconfig.xml and introduce all the custom options one-by-one (i.e. ShardHandlerFactory, etc.)? --- Nick Vladiceanu vladicean...@gmail.com > On 18. Jan 2023, at 18:41, Kevin Risden <kris...@apache.org> wrote: > > So I am going to share some ideas just in case it triggers something - I > have this gut feel that the cores are closing due to an exception of some > kind. It seems like a lot of the issue is either index corruption or > "SolrCoreState already closed." > > * Does the pod volume have the correct permissions for Solr to read/write? > * Are you indexing these nodes or just querying? (asking this if these are > meant to be read only that would be different than a changing index) > * Have you taken into account > https://issues.apache.org/jira/browse/SOLR-16463 by chance if you have a > custom Docker image? (this might not be necessary since you say it > reproduces on JDK 11) > > I found this part of your update the most intriguing. Why would changing > the directory factory change this? My understanding is everything under > "/var/solr/data/my_collection_shard3_replica_t1643" should be controlled by > Solr both read/write so any directories underneath would be created > automatically. > > directoryFactory: >> https://solr.apache.org/docs/9_1_0/core/org/apache/solr/core >> /MMapDirectoryFactory.html throwing the following exception: >> o.a.s.c.SolrCore java.lang.IllegalArgumentException: Unknown directory: >> MMapDirectory@/var/solr/data/my_collection_shard3_replica_t1643/data/snapshot_metadata >> (we do not use snapshots at all) (stack trace https://justpaste.it/88en6) >> Switched to https://solr.apache.org/docs/9_1_0/core/org/apache/solr/core >> /StandardDirectoryFactory.html; problem solved, no more Unknown directory >> exceptions >> Reload won’t fail on some nodes with Unknown directory exception; >> Result: reload still timing out, fewer exceptions; > > > My guess is that the reload is going to some node and that one node is > causing the whole process to timeout. If you find that node then you should > be able to collect the logs and see. Basically there is some reason the > cores are closing and its not good. I would guess the collection reload > timing out is just a symptom of whatever the bigger underlying cause is. > > Kevin Risden > > > On Wed, Dec 21, 2022 at 5:57 AM Nick Vladiceanu <vladicean...@gmail.com> > wrote: > >> yes, it’s very unusual. On Solr 8.11 (and previous versions) with setup >> and size of data, reload takes just a few seconds. >> >> We are having 6 shards spread across 96 replicas. Each replica is hosted >> on a dedicated EC2 instance, no more than one replica present on the same >> machine (in k8s words, it's one pod per node, one solr replica per pod). >> >> I am able to reproduce the reload issue on Solr 9.0 and 9.1. Tried to >> isolate the underlying node along with the Solr Pod, couldn’t identify and >> issues like high load, iowait, whatsoever. Only issues I see are exceptions >> in the Solr logs that never recover unless Pods are restarted (we use empty >> dir and not persistent value, every time a pod is restarted, the cores that >> were hosted on it are removed and when the pod comes back it gets allocated >> to the same shard or another shard, depending on how many replicas other >> shards have, we try to keep balance in # of replicas) >> >> I agree that 23GB of heap is a bit too much and are doing some work to >> optimize it (resizing caches, etc.). We tried to lower the heap to 20GB >> already and GC performance is better, and in general Solr performs better. >> I must mention that we have the same heap size in Solr 8.11 and it doesn’t >> cause any issues with the reload. Could it have an impact on Solr 9, >> somehow? >> >> Thank you a lot for sharing your thoughts, especially for explaining GC >> params and sharing yours, very much appreciated. >> >> Do you have any ideas on what we should try more? Here is the digest of >> what we have tried, but without any success: >> >> Zookeeper: Upgrade from 3.6 to 3.7 and 3.8: no impact; >> >> DNS: Solr pods joining and communicating over Pod IP instead of Pod Svc >> DNS name (headless). This was done in order to avoid any potential issues >> (even though CoreDNS/nodelocaldns metrics looked Ok) with DNS resolvers; no >> impact; >> >> Lucene: upgrade to Lucene 9.1.0, 9.2.0, 9.3.0; no impact; >> >> Solr Nodes: >> version: >> tried Solr 9.0.0 and Solr 9.1.0 >> Result: no difference; >> Heap: >> recalculate the Heap size; >> reduce the size by 3GB (15%) in combination with caches resize (see below); >> Result: better performance, no old GC is triggered; cluster more stable; >> reload still timing out; >> TLOG and PULL: >> tested with 3TLOG replicas per shard, the rest 12 replicas PULL; >> tested all 15 replicas per shard of type TLOG; >> NRT is not an option at all, didn’t even try to test; >> Result: better response time with PULL, no impact on reload; >> Other tunings, including gc; no impact; >> >> solrconfig.xml: >> directoryFactory: >> >> https://solr.apache.org/docs/9_1_0/core/org/apache/solr/core/MMapDirectoryFactory.html >> throwing the following exception: o.a.s.c.SolrCore >> java.lang.IllegalArgumentException: Unknown directory: >> MMapDirectory@/var/solr/data/my_collection_shard3_replica_t1643/data/snapshot_metadata >> (we do not use snapshots at all) (stack trace https://justpaste.it/88en6) >> Switched to >> https://solr.apache.org/docs/9_1_0/core/org/apache/solr/core/StandardDirectoryFactory.html; >> problem solved, no more Unknown directory exceptions >> Reload won’t fail on some nodes with Unknown directory exception; >> Result: reload still timing out, fewer exceptions; >> lockType: >> Switched between “native” and “simple” lock type; >> Result: no impact; >> HttpShardHandlerFactory >> Increased timeout by 40% for cross-shards communication while doing >> queries; >> Result: no impact; >> filterCache,queryResultCache and documentCache: >> Limit the size of the caches to Megabyte instead of entries: >> filterCache - 1024MB >> queryResultCache - 1024MB >> documentCache - 2048MB >> Result: nodes are more stable during reload, cluster is not destabilizing, >> no old GC activity; better response time, less pressure on GC; >> circuitBreaker: >> disabling circuitBreaker; >> Result: no impact; >> >> --- >> Nick Vladiceanu >> vladicean...@gmail.com >> >> >> >> >>> On 20. Dec 2022, at 15:58, Shawn Heisey <apa...@elyograg.org> wrote: >>> >>> On 12/20/22 06:34, Nick Vladiceanu wrote: >>>> Thank you Shawn for sharing, indeed useful information. >>>> However, I must say that we only used deleteById and never >> deleteByQuery. We also only rely on the auto segment merging and not >> issuing optimize command. >>> >>> That is very unusual. I've never seen a core reload take more than a >> few seconds, even when I was dealing with core sizes of double-digit GB. >> Unless you have hundreds or thousands of replicas for each of your 6 >> shards, it really should complete very quickly. >>> >>> Have you been able to determine which Solr cores in the collection are >> causing the delay, and take a look at those machines? >>> >>> Some thoughts: >>> >>> When you said 96 nodes, were you talking about Solr instances or >> servers? You really should only run one Solr instance per server, >> especially for a small index like this. >>> >>> A 23GB heap seems very excessive for a 4.7GB index that has less than 4 >> million documents. I'm sure you can reduce that by a lot and encounter >> smaller GC pauses as a result. If you can share your GC logs, I should be >> able to provide a recommendation. >>> >>> I've been looking at what MinHeapFreeRatio and MaxHeapFreeRatio do. >> Those settings are probably unnecessary. This is what I currently use for >> GC tuning on JDK 11 or JDK 17. This produces EXTREMELY short collection >> pauses, but I have noticed that throughput-heavy things like indexing run a >> bit slower, but if the indexing is multi-threaded, I think that it would >> not be affected a lot. >>> >>> GC_TUNE=" \ >>> -XX:+UnlockExperimentalVMOptions \ >>> -XX:+UseZGC \ >>> -XX:+ParallelRefProcEnabled \ >>> -XX:+ExplicitGCInvokesConcurrent \ >>> -XX:+UseStringDeduplication \ >>> -XX:+AlwaysPreTouch \ >>> -XX:+UseNUMA \ >>> " >>> >>> ZGC has one unexpected disadvantage. Using it will disable Compressed >> OOPs -- meaning that even with a heap smaller than 32GB, it uses 64 bit >> pointers. This hasn't really impacted me ... the index is so small that >> with a 1GB heap I have more than enough. If low pauses are the most >> important thing you need from GC and you're running at least JDK11, I would >> strongly recommend ZGC. It does make indexing slower for me -- a full >> rebuild that takes 10 minutes with G1 takes 11 minutes with ZGC. But even >> the worst-case GC pauses are single-digit milliseconds. >>> >>> For G1GC, which is still the best option for JDK8, this is what I used >> to have: >>> >>> #GC_TUNE=" \ >>> # -XX:+UseG1GC \ >>> # -XX:+ParallelRefProcEnabled \ >>> # -XX:MaxGCPauseMillis=100 \ >>> # -XX:+ExplicitGCInvokesConcurrent \ >>> # -XX:+UseStringDeduplication \ >>> # -XX:+AlwaysPreTouch \ >>> # -XX:+UseNUMA \ >>> #" >>> >>> Thanks, >>> Shawn >> >>