[
https://issues.apache.org/jira/browse/LUCENE-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389982#comment-14389982
]
Robert Muir commented on LUCENE-6383:
-------------------------------------
We should also look into why [~jpountz]'s test for "things getting bigger on
merge" (BaseIndexFileFormatTestCase.testMergeStability) doesnt find this.
Maybe the problems are specific to payloads? BasePostingsFormat.addRandomFields
iterates through all the possible index options, but never adds any payloads.
Could be a tricky thing to do in tests in general, because of optimizations
when payloads have the same length.
There is a TODO in TestMemoryPostingsFormat to randomize pack=true/false. Maybe
its related too, TestMemoryPF never tests that, but RandomCodec randomizes the
option.
> MemoryPostings fst encoding can be surprisingly inefficient (especially in
> tests, with payloads)
> ------------------------------------------------------------------------------------------------
>
> Key: LUCENE-6383
> URL: https://issues.apache.org/jira/browse/LUCENE-6383
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
>
> I just worked around this in 2 nightly OOM fails.
> One was TestDuelingCodecs, the other was TestIndexWriterForceMerge's space
> usage test.
> In general the trend is the same, it seems the more documents you merge, you
> just get bigger and bigger FST outputs and the size of this PF in ram and on
> disk grows in a way you don't expect. E.g. merging 300KB of segments resulted
> in 450KB single segment, and memory usage gets absurdly high.
> The issue seems especially aggravated in tests, when MockAnalyzer adds lots
> of payloads.
> Maybe it should encode the postings data in a more efficient way? Can it just
> use a Long output pointing into a RAMFile or something? Or maybe there is
> just a crazy bug?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]