Hi Shawn, Thank you for your explanations and advices.
I agree with the bad terminology I used between shard and replica. I'm trying to understand and fix intermittent spikes in heap memory usage and threads count. When the issue occurs Heap dump shows a lot of threads building the filtercache. Each thread consuming a lot of heap memory. In consequence, even if the heap is not full, there are some consecutive full GC. In the solr logs file, I also see Cdcr related messages and I see concurrent searchers opening in a few seconds on the same replica. I don't understand these searcher opening in the same second as autosoftcommit maxtime si 60000 and autocommit maxtime is 300000 with opensearcher set to false. solr.log.18:394492:2022-11-23 14:45:37.134 INFO (zkCallback-5-thread-128) [ ] o.a.s.h.CdcrLeaderStateManager Received new leader state @ SSSS:shard1 solr.log.18:418525:2022-11-23 14:45:48.533 INFO (Thread-2218) [c:SSSS s:shard1 r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.s.SolrIndexSearcher Opening [Searcher@39764e3a[SSSS_shard1_replica_t89] main] solr.log.18:418626:2022-11-23 14:45:49.535 INFO (Thread-2218) [c:SSSS s:shard1 r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.s.SolrIndexSearcher Opening [Searcher@3dd70403[SSSS_shard1_replica_t89] main] solr.log.18:418667:2022-11-23 14:45:50.090 INFO (recoveryExecutor-4-thread-13-processing-n:no2fyy27.noe.edf.fr:8984_solr x:SSSS_shard1_replica_t89 c:SSSS s:shard1 r:core_node90) [c:SSSS s:shard1 r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.s.SolrIndexSearcher Opening [Searcher@4a51a759[SSSS_shard1_replica_t89] main] solr.log.18:418682:2022-11-23 14:45:50.153 INFO (recoveryExecutor-4-thread-13-processing-n:no2fyy27.noe.edf.fr:8984_solr x:SSSS_shard1_replica_t89 c:SSSS s:shard1 r:core_node90) [c:SSSS s:shard1 r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.h.CdcrRequestHandler Solr core is being closed - shutting down CDCR handler @ SSSS:shard1 or solr.log.18:454048:2022-11-23 14:59:21.351 INFO (Thread-2668) [c:SSSS s:shard1 r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.s.SolrIndexSearcher Opening [Searcher@273ebecf[SSSS_shard1_replica_t89] main] solr.log.18:454403:2022-11-23 14:59:21.993 INFO (Thread-2668) [c:SSSS s:shard1 r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.s.SolrIndexSearcher Opening [Searcher@711992c5[SSSS_shard1_replica_t89] main] solr.log.18:454484:2022-11-23 14:59:22.588 INFO (recoveryExecutor-4-thread-17-processing-n:no2fyy27.noe.edf.fr:8984_solr x:SSSS_shard1_replica_t89 c:SSSS s:shard1 r:core_node90) [c:SSSS s:shard1 r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.s.SolrIndexSearcher Opening [Searcher@5258502d[SSSS_shard1_replica_t89] main] solr.log.18:454502:2022-11-23 14:59:22.609 INFO (recoveryExecutor-4-thread-17-processing-n:no2fyy27.noe.edf.fr:8984_solr x:SSSS_shard1_replica_t89 c:SSSS s:shard1 r:core_node90) [c:SSSS s:shard1 r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.h.CdcrRequestHandler Solr core is being closed - shutting down CDCR handler @ SSSS:shard1 And finally, yes there are two solr instances per server. The architecture is * 7 servers (96GB RAM / 12 CPU) * 14 solr instances (24GB Heap) Huge collection sharding : 14 shards x 2 replica all TLOG Total number of documents : 1.5 billions (100 millions per shard) Regards Dominique Le ven. 16 déc. 2022 à 09:43, Shawn Heisey <[email protected]> a écrit : > On 12/15/22 12:43, Dominique Bejean wrote: > > I have a sharded collection distributed over several solr nodes. Each > solr > > node hosts one shard and one replica of another shard. shards are huge > (100 > > millions documents). Queries are using several filterQuery. filterCache > for > > this number of documents can use high amount of heap memory. > > Terminology nit: Each node is hosting two replicas of different shards. > It's not "a shard and a replica" ... it's "two replicas of shard N". > One replica becomes leader, but that is a mutable temporary distinction. > Any NRT or TLOG replica can become leader. > > > Is it a good idea to split shards by 2 or 4 in order to have shards with > 50 > > or 25 millions documents ? > > With a split by 4, a Solr node will host 8 replicas instead of 2, but > with > > smaller filterCache for each replica. > > The total amount of heap memory required for the filterCache will not be > reduced by this. And there are other per-core memory structures that > would need more total heap memory, not less. > > If you have a LOT of CPU capacity and a fairly low query rate, more > shards per Solr instance can yield a net increase in query performance. > But if the query rate is high or the CPU core count is low, you're > probably better off with fewer shards. > > > I don't expect to have better search performances, but I expect to have > > faster warming and mainly less impacted heap memory by open searcher > during > > sofcommit. For instance, instead of having one large filterCache warming > > up once each minute, 4 smaller filterCaches will warm up not at the same > > time (hopefully). > > Multiple smaller caches at the same time might warm a bit faster if > system capacities are sufficient. But you seem to be under the > impression that running eight smaller indexes instead of two larger > indexes will drop your heap requirement. It won't. It would actually > increase it, and due to efficiencies in the way Lucene builds each > index, it would also increase your disk space requirements. I don't > have any way of knowing how much these things would increase. > > Multiple threads/processes is the best way to increase Solr indexing > performance. Memory is the best way to increase general Solr > performance -- lots of extra memory for disk caching, especially with > really big indexes. Which might mean greatly increasing the server > count and also increasing the shard count, so each server's indexes are > smaller and fit into the disk cache better. > > At least you haven't mentioned running multiple Solr instances per > machine. That only makes sense for REALLY large installs. I'm sorry to > tell you that while yours is not small, we've heard from people who have > billions of documents and terabytes of index data when counting only one > replica of each shard. For now Solr sees better memory efficiency by > keeping the heap below 32GB, so for really large systems, two Solr > instances each with a 31GB heap actually has MORE memory available for > Solr than one instance with a 64GB heap. > > Thanks, > Shawn >
