[
https://issues.apache.org/jira/browse/LUCENE-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390318#comment-14390318
]
Robert Muir commented on LUCENE-6383:
-------------------------------------
In this case merging creates a bigger index. 300 KB of segments becomes a 450KB
single segment. So its not the same problem exactly...
> MemoryPostings fst encoding can be surprisingly inefficient (especially in
> tests, with payloads)
> ------------------------------------------------------------------------------------------------
>
> Key: LUCENE-6383
> URL: https://issues.apache.org/jira/browse/LUCENE-6383
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
>
> I just worked around this in 2 nightly OOM fails.
> One was TestDuelingCodecs, the other was TestIndexWriterForceMerge's space
> usage test.
> In general the trend is the same, it seems the more documents you merge, you
> just get bigger and bigger FST outputs and the size of this PF in ram and on
> disk grows in a way you don't expect. E.g. merging 300KB of segments resulted
> in 450KB single segment, and memory usage gets absurdly high.
> The issue seems especially aggravated in tests, when MockAnalyzer adds lots
> of payloads.
> Maybe it should encode the postings data in a more efficient way? Can it just
> use a Long output pointing into a RAMFile or something? Or maybe there is
> just a crazy bug?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]