Would either of these two settings possibly help, at least during full
reindexes?
<!-- ramBufferSizeMB sets the amount of RAM that may be used by Lucene
indexing for buffering added documents and deletions before
they are
flushed to the Directory.
maxBufferedDocs sets a limit on the number of documents buffered
before flushing.
If both ramBufferSizeMB and maxBufferedDocs is set, then
Lucene will flush based on whichever limit is hit first. -->
<!-- <ramBufferSizeMB>100</ramBufferSizeMB> -->
<!-- <maxBufferedDocs>1000</maxBufferedDocs> -->
On 11/4/21 08:27, Michael Conrad wrote:
When segment count is around 16 to 20, performance seems OK.
Here is our config:
Solr 7.7.3.
java -version
openjdk version "11.0.11" 2021-04-20
Our current config is 4 solr nodes with 5 collections. Each collection
is split into two shards. Each collection is replicated.
They are Amazon VMs. 8GB Ram each, 2 CPU cores each. Index storage is
on NVME backed volumes.
When there is a high segment count processes on the systems start
showing very high "i/o wait" with associated idle CPU time according
to top. Which indicates to me that CPU core count isn't a main culprit.
SOLR_JAVA_MEM="-Xms1g -Xmx5g"
GC_TUNE=""
SOLR_LOG_LEVEL="WARN"
vm.swappiness = 0
swap in use = 0
free -h
total used free shared buff/cache
available
Mem: 7.7G 6.3G 130M 80M 1.3G 1.0G
Swap: 15G 0B 15G
On 11/3/21 23:19, Shawn Heisey wrote:
On 11/3/2021 1:44 PM, Michael Conrad wrote:
Is there a way to set max segment count for builtin merge policy?
I'm having a serious issue where I'm trying to reindex 75 million
documents and the segment count skyrockets with associated
significant drop in performance. To the point we start getting lots
of timeouts.
Is there a way to set the merge policy to try and keep the total
segment count to around 16 or so? (This seems to be close to the max
the hosts can manage without having serious performance issues.)
Solr 7.7.3.
The way to reduce total segment count is to reduce the thresholds
that used to be controlled by mergeFactor. I don't know of a way to
explicitly set the max total count, but the per-tier count will
affect the total count. There will be at least three tiers of
merging on most Solr installs, so the max total segment count will be
at least three times the per-tier setting.
This config represents the defaults for Solr's merging policy:
<mergePolicyFactory
class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="segmentsPerTier">10</int>
</mergePolicyFactory>
On some Solr servers that I used to manage, those numbers were set to
35. I regularly saw total segment counts larger than 100. That did
not affect performance in a significant way.
If you are seeing significant performance problems it is more likely
one of two problems that have nothing to do with the segment count:
1) Your max heap size is not quite big enough and needs to be
increased. This can lead to severe GC pauses because Java will spend
more time doing GC than running the application.
2) Your index is so big that the amount of free memory on the server
cannot effectively cache it. The fix for that is to add physical
memory, so that more unallocated memory is available to the operating
system. Solr is absolutely reliant on effective index caching for
performance.
More of a side note: One problem that you might be having with
indexing millions of documents is that the indexing thread can get
paused when merging becomes heavy. This will be even more likely to
happen if you reduce the numbers in the config that I included
above. The fix for that is to fiddle with the mergeScheduler config.
<mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int name="maxMergeCount">6</int>
<int name="maxThreadCount">1</int>
</mergeScheduler>
Some notes: Go with a maxMergeCount that's at least 6. If your
indexes are on spinning hard disks, leave maxThreadCount at 1. If the
indexes are on SSD, you can increase the thread count, but don't go
too wild. Probably 3 or 4 max, and I would be more likely to choose
2. I have never had indexes on SSD, so I do not know how many
threads are too many.
Thanks,
Shawn