[jira] [Created] (LUCENE-6383) MemoryPostings fst encoding can be surprisingly inefficient (especially in tests, with payloads)

Robert Muir (JIRA) Tue, 31 Mar 2015 21:30:05 -0700

Robert Muir created LUCENE-6383:
-----------------------------------

             Summary: MemoryPostings fst encoding can be surprisingly 
inefficient (especially in tests, with payloads)
                 Key: LUCENE-6383
                 URL: https://issues.apache.org/jira/browse/LUCENE-6383
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Robert Muir



I just worked around this in 2 nightly OOM fails.

One was TestDuelingCodecs, the other was TestIndexWriterForceMerge's space 
usage test.

In general the trend is the same, it seems the more documents you merge, you 
just get bigger and bigger FST outputs and the size of this PF in ram and on 
disk grows in a way you don't expect. E.g. merging 300KB of segments resulted 
in 450KB single segment, and memory usage gets absurdly high.

The issue seems especially aggravated in tests, when MockAnalyzer adds lots of 
payloads.

Maybe it should encode the postings data in a more efficient way? Can it just 
use a Long output pointing into a RAMFile or something? Or maybe there is just 
a crazy bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (LUCENE-6383) MemoryPostings fst encoding can be surprisingly inefficient (especially in tests, with payloads)

Reply via email to