On 6/22/2021 2:24 PM, Stephen Lewis Bianamara wrote:
Thanks Shawn! That is really helpful to know. Can you say more about what circumstance might cause an index to triple in size? Is it connected with bulk operations like "optimize" which can be avoided, or is it inherent to situations like merging segments? And if so, can this requirement be adjusted by an appropriate setting of maxMergedSegmentMB or something similar?
Any merge, whether it's optimize (forcemerge) or normal merging, can involve the entire index.
Let's say you have an index that has a number of very large segments. Either you optimized it at some point or it's just been running for a long time and has reached that state naturally.
You begin a reindexing process. This process hits almost all the documents in the index, but a few are left untouched.
Those few untouched documents mean that the segments containing them must stick around, even though they're comprised almost entirely of deleted documents.
At this point, without even doing an optimize, the index has doubled in size -- the original segments are still there because they contain a few not-deleted docs, and all the new data is in new segments. In practice, some of those older segments probably got merged and shrank, but we're discussing worst-case scenarios here, so pretend for a moment that they have not been merged away.
Then either you do some more indexing that results in a super-large merge, or run an optimize. At this point, with the index already doubled in size, that further merging could add the whole index again before it deletes the older segments and you're back to 1x.
Realistically, you probably need enough space for the index to reach 2.5x when doing in-place reindexing, but if the planets all align just right, you could need 3x. If you never reindex the whole thing in place (without either creating a new index or deleting the existing one) then you would only need 2x. But because sometimes the planets do align just right, I tell people to have 3x just in case.
Thanks Shawn