[
https://issues.apache.org/jira/browse/LUCENE-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390135#comment-14390135
]
Adrien Grand commented on LUCENE-6383:
--------------------------------------
bq. We should also look into why Adrien Grand's test for "things getting bigger
on merge" (BaseIndexFileFormatTestCase.testMergeStability) doesnt find this.
>From your description of the problem, it looks to me that payloads are
>inefficiently encoded, but not that they wrongly accumulate upon merging?
>(which is what the test checks) We added this test when we found a bug in a
>codec that kept on copying the codec footer when merging so that after N
>merges, some segment files would have N codec footers (with only the last one
>containing the right checksum). The issue looks different here?
> MemoryPostings fst encoding can be surprisingly inefficient (especially in
> tests, with payloads)
> ------------------------------------------------------------------------------------------------
>
> Key: LUCENE-6383
> URL: https://issues.apache.org/jira/browse/LUCENE-6383
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
>
> I just worked around this in 2 nightly OOM fails.
> One was TestDuelingCodecs, the other was TestIndexWriterForceMerge's space
> usage test.
> In general the trend is the same, it seems the more documents you merge, you
> just get bigger and bigger FST outputs and the size of this PF in ram and on
> disk grows in a way you don't expect. E.g. merging 300KB of segments resulted
> in 450KB single segment, and memory usage gets absurdly high.
> The issue seems especially aggravated in tests, when MockAnalyzer adds lots
> of payloads.
> Maybe it should encode the postings data in a more efficient way? Can it just
> use a Long output pointing into a RAMFile or something? Or maybe there is
> just a crazy bug?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]