[ 
https://issues.apache.org/jira/browse/LUCENE-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390135#comment-14390135
 ] 

Adrien Grand commented on LUCENE-6383:
--------------------------------------

bq. We should also look into why Adrien Grand's test for "things getting bigger 
on merge" (BaseIndexFileFormatTestCase.testMergeStability) doesnt find this. 

>From your description of the problem, it looks to me that payloads are 
>inefficiently encoded, but not that they wrongly accumulate upon merging? 
>(which is what the test checks) We added this test when we found a bug in a 
>codec that kept on copying the codec footer when merging so that after N 
>merges, some segment files would have N codec footers (with only the last one 
>containing the right checksum). The issue looks different here?

> MemoryPostings fst encoding can be surprisingly inefficient (especially in 
> tests, with payloads)
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-6383
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6383
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>
> I just worked around this in 2 nightly OOM fails.
> One was TestDuelingCodecs, the other was TestIndexWriterForceMerge's space 
> usage test.
> In general the trend is the same, it seems the more documents you merge, you 
> just get bigger and bigger FST outputs and the size of this PF in ram and on 
> disk grows in a way you don't expect. E.g. merging 300KB of segments resulted 
> in 450KB single segment, and memory usage gets absurdly high.
> The issue seems especially aggravated in tests, when MockAnalyzer adds lots 
> of payloads.
> Maybe it should encode the postings data in a more efficient way? Can it just 
> use a Long output pointing into a RAMFile or something? Or maybe there is 
> just a crazy bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to