Are you having iowait, gc pauses, or something else? Do you commit often or in 
one big batch? 

> On Jun 20, 2024, at 12:26 AM, Saksham Gupta 
> <saksham.gu...@indiamart.com.invalid> wrote:
> 
> Hi All,
> 
> We have been facing extra load incidents due to higher gc count and gc time
> causing higher response time and timeouts.
> 
> Solr Cloud Cluster Details
> 
> We use solr cloud v8.10 [with java 8 and G1 GC] with 8 shards where each
> shard is present on a single vm of 16 cores and 50 gb RAM. Size of each
> shard is ~28 gb and heap of solr is 16 gb [heap utilization only for
> filter, document, and queryResults cache each of size 512].
> 
> Problem Details
> 
> We pause indexing at 11 AM during peak searching hours. Normally the system
> remains stable during the peak hours, but when documents update count on
> solr is higher before peak hours [b/w from 5.30 AM to 11 AM], we face
> multiple load issues. The gc count and gc time increases and cpu is
> consumed in gc itself thereby increasing load and response time of the
> system. To mitigate this, we recently increased the ram on the servers [to
> 50 gb from 42 gb previously], as to reduce the io wait for writing solr
> index on memory multiple times. Taking a step further, we also increased
> the heap of solr from 12 to 16 gb [also tried other combinations like 14
> gb, 15 gb, 18 gb], although we found some reduction in load issues due to
> lower io wait, still the issue recurs when higher indexing is done.
> 
> We have explored a few options like expunge deletes, which may help reduce
> the deleted documents percentage, but that cannot be executed close to peak
> hours, as it increases io wait which further spikes load and response time
> of solr significantly.
> 
> 
>   1.
> 
>   Apart from changing the expunge deletes timing, is there another option
>   which we can try to mitigate this problem?
>   2.
> 
>   Approximately 60 million documents are updated each day i.e. ~30% of the
>   complete solr index is modified each day while serving ~20 million search
>   requests. Would appreciate any knowledge upon how to handle such high
>   indexing + searching traffic during peak hours.

Reply via email to