On 6/22/2021 2:24 PM, Stephen Lewis Bianamara wrote:
Thanks Shawn! That is really helpful to know. Can you say more about what
circumstance might cause an index to triple in size? Is it connected with
bulk operations like "optimize" which can be avoided, or is it inherent to
situations like merging segments? And if so, can this requirement be
adjusted by an appropriate setting of maxMergedSegmentMB or something
similar?


Any merge, whether it's optimize (forcemerge) or normal merging, can involve the entire index.

Let's say you have an index that has a number of very large segments.  Either you optimized it at some point or it's just been running for a long time and has reached that state naturally.

You begin a reindexing process.  This process hits almost all the documents in the index, but a few are left untouched.

Those few untouched documents mean that the segments containing them must stick around, even though they're comprised almost entirely of deleted documents.

At this point, without even doing an optimize, the index has doubled in size -- the original segments are still there because they contain a few not-deleted docs, and all the new data is in new segments.  In practice, some of those older segments probably got merged and shrank, but we're discussing worst-case scenarios here, so pretend for a moment that they have not been merged away.

Then either you do some more indexing that results in a super-large merge, or run an optimize.  At this point, with the index already doubled in size, that further merging could add the whole index again before it deletes the older segments and you're back to 1x.

Realistically, you probably need enough space for the index to reach 2.5x when doing in-place reindexing, but if the planets all align just right, you could need 3x.  If you never reindex the whole thing in place (without either creating a new index or deleting the existing one) then you would only need 2x.  But because sometimes the planets do align just right, I tell people to have 3x just in case.

Thanks
Shawn

Reply via email to