Re: Is there a way to set max segment count for builtin merge policy?

Shawn Heisey Wed, 03 Nov 2021 20:19:58 -0700

On 11/3/2021 1:44 PM, Michael Conrad wrote:

Is there a way to set max segment count for builtin merge policy?
I'm having a serious issue where I'm trying to reindex 75 milliondocuments and the segment count skyrockets with associated significantdrop in performance. To the point we start getting lots of timeouts.
Is there a way to set the merge policy to try and keep the total segmentcount to around 16 or so? (This seems to be close to the max the hostscan manage without having serious performance issues.)
Solr 7.7.3.

The way to reduce total segment count is to reduce the thresholds thatused to be controlled by mergeFactor. I don't know of a way toexplicitly set the max total count, but the per-tier count will affectthe total count. There will be at least three tiers of merging on mostSolr installs, so the max total segment count will be at least threetimes the per-tier setting.


This config represents the defaults for Solr's merging policy:

<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
  <int name="maxMergeAtOnce">10</int>
  <int name="segmentsPerTier">10</int>
</mergePolicyFactory>

On some Solr servers that I used to manage, those numbers were set to35. I regularly saw total segment counts larger than 100. That did notaffect performance in a significant way.

If you are seeing significant performance problems it is more likely oneof two problems that have nothing to do with the segment count:

1) Your max heap size is not quite big enough and needs to be increased.This can lead to severe GC pauses because Java will spend more timedoing GC than running the application.

2) Your index is so big that the amount of free memory on the servercannot effectively cache it. The fix for that is to add physicalmemory, so that more unallocated memory is available to the operatingsystem. Solr is absolutely reliant on effective index caching forperformance.

More of a side note: One problem that you might be having with indexingmillions of documents is that the indexing thread can get paused whenmerging becomes heavy. This will be even more likely to happen if youreduce the numbers in the config that I included above. The fix forthat is to fiddle with the mergeScheduler config.


<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
  <int name="maxMergeCount">6</int>
  <int name="maxThreadCount">1</int>
</mergeScheduler>

Some notes: Go with a maxMergeCount that's at least 6. If your indexesare on spinning hard disks, leave maxThreadCount at 1. If the indexesare on SSD, you can increase the thread count, but don't go too wild.Probably 3 or 4 max, and I would be more likely to choose 2. I havenever had indexes on SSD, so I do not know how many threads are too many.


Thanks,
Shawn

Re: Is there a way to set max segment count for builtin merge policy?

Reply via email to