[
https://issues.apache.org/jira/browse/LUCENE-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shai Erera updated LUCENE-5941:
-------------------------------
Attachment: LUCENE-5941.patch
Patch fixes the test bug -- we cannot assume that a merged index's size is
always <= the starting index size (e.g. if compression is affected, or the
Codec changes etc.). With the seed above, what happened is that the MemoryPF
created a merged segment that was slightly bigger than the sum of all input
{{.ram}} files (I don't know why). That caused the test to fail, because of
this math:
{noformat}
startIndexSize = 380K
finalIndexSize = 416K
380 * 3 < 380 (source) + 416 (final) + [380 TO 416] (temp files)
{noformat}
The assertion message also proves that in that we don't require up to 3X
*additional* free space (and comparing to 4X total space used), as we only used
3.15X than the starting index size.
I've changed the test to assert on the maximum size before and after the merge,
and compare {{maxUsedSizeInBytes}} to 3X that size. This also allows the
current test behavior, which changes the codec after indexing is done, and
before the merge is executed. In fact, unless I'm missing something, we might
just have been lucky thus far not tripping the test, if e.g. the default codec
was switched to SimpleText just before the merge...
> IndexWriter.forceMerge documentation error
> ------------------------------------------
>
> Key: LUCENE-5941
> URL: https://issues.apache.org/jira/browse/LUCENE-5941
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Reporter: Shai Erera
> Assignee: Shai Erera
> Attachments: LUCENE-5941.patch, LUCENE-5941.patch
>
>
> IndexWriter.forceMerge documents that it requires up to 3X *FREE* space in
> order to run successfully. We even go further with it and test it in
> TestIWForceMerge.testForceMergeTempSpaceUsage(). But I think that's wrong. I
> cannot think of a situation where we consume 3X *additional* space during
> merge:
> * 1X - that's the source segments to be merged
> * 2X - that's the result non-CFS merged segment
> * 3X - that's the CFS creation
> At no point do we publish the non-CFS merged segment, therefore the merge, as
> I understand it, only consumes up to 2X additional space during that merge.
> And anyway, we only require 2X of additional space of the *largest* merge (or
> total batch of running merges, depends on your MergeScheduler), not the whole
> index size. This is an important observation, since if you e.g. have a 500GB
> index, users shouldn't think they need to reserve an additional 1TB for
> merging, since most of their big segments won't be merged by default anyway
> (TieredMP defaults to 5GB largest segment).
> I'll post a patch which fixes the documentation and the test. If anyone can
> think of a scenario where we consume up to 3X *additional* space, please
> chime, and I'll only modify IW.forceMerge documentation to explain that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]