Can you please tell me about the hardware details (Server type, CPU speed and type, Disk Speed and type) and GC configuration? Also please post results of top, iotop if you can?
Deepak "The greatness of a nation can be judged by the way its animals are treated - Mahatma Gandhi" +91 73500 12833 deic...@gmail.com LinkedIn: www.linkedin.com/in/deicool "Plant a Tree, Go Green" Make In India : http://www.makeinindia.com/home On Thu, Jun 20, 2024 at 11:24 AM Oleksandr Tkachuk <sasha547...@gmail.com> wrote: > Use tlog+pull replicas, they will improve the situation significantly > > чт, 20 июн. 2024 г., 07:27 Saksham Gupta > <saksham.gu...@indiamart.com.invalid>: > > > Hi All, > > > > We have been facing extra load incidents due to higher gc count and gc > time > > causing higher response time and timeouts. > > > > Solr Cloud Cluster Details > > > > We use solr cloud v8.10 [with java 8 and G1 GC] with 8 shards where each > > shard is present on a single vm of 16 cores and 50 gb RAM. Size of each > > shard is ~28 gb and heap of solr is 16 gb [heap utilization only for > > filter, document, and queryResults cache each of size 512]. > > > > Problem Details > > > > We pause indexing at 11 AM during peak searching hours. Normally the > system > > remains stable during the peak hours, but when documents update count on > > solr is higher before peak hours [b/w from 5.30 AM to 11 AM], we face > > multiple load issues. The gc count and gc time increases and cpu is > > consumed in gc itself thereby increasing load and response time of the > > system. To mitigate this, we recently increased the ram on the servers > [to > > 50 gb from 42 gb previously], as to reduce the io wait for writing solr > > index on memory multiple times. Taking a step further, we also increased > > the heap of solr from 12 to 16 gb [also tried other combinations like 14 > > gb, 15 gb, 18 gb], although we found some reduction in load issues due to > > lower io wait, still the issue recurs when higher indexing is done. > > > > We have explored a few options like expunge deletes, which may help > reduce > > the deleted documents percentage, but that cannot be executed close to > peak > > hours, as it increases io wait which further spikes load and response > time > > of solr significantly. > > > > > > 1. > > > > Apart from changing the expunge deletes timing, is there another > option > > which we can try to mitigate this problem? > > 2. > > > > Approximately 60 million documents are updated each day i.e. ~30% of > the > > complete solr index is modified each day while serving ~20 million > > search > > requests. Would appreciate any knowledge upon how to handle such high > > indexing + searching traffic during peak hours. > > >