Yes, it happens as part of the early morning optimize, and yes, it's a forceMerge(1) which I've disabled for now.

I haven't looked at the persistence mechanism for Lucene since 2.x, but if I remember correctly, the deleted documents would stay in an index segment until that segment was eventually merged. Without forcing a merge (optimize in old versions), the footprint on disk could be a multiple of the actual space required for the live documents, and this would have an impact on performance (the deleted documents would clutter the buffer cache).

Is this still the case? I would have thought it good practice to force the dead space out of an index periodically, but if the underlying storage mechanism has changed and the current index files are more efficient at housekeeping, this may no longer be necessary.

If someone could shed a little light on best practice for indexes where documents are frequently updated (i.e. deleted and re-added), that would be great.

Michael.


On 2013/09/26 11:43 AM, Ian Lea wrote:
Is this OOM happening as part of your early morning optimize or at
some other point?  By optimize do you mean IndexWriter.forceMerge(1)?
You really shouldn't have to use that. If the index grows forever
without it then something else is going on which you might wish to
report separately.


--
Ian.


On Wed, Sep 25, 2013 at 12:35 PM, Michael van Rooyen <mich...@loot.co.za> wrote:
We've recently upgraded to Lucene 4.4.0 and mergeSegments now causes an OOM
error.

As background, our index contains about 14 million documents (growing
slowly) and we process about 1 million updates per day. It's about 8GB on
disk.  I'm not sure if the Lucene segments merge the way they used to in the
early versions, but we've always optimized at 3am to get rid of dead space
in the index, or otherwise it grows forever.

The mergeSegments was working under 4.3.1 but the index has grown somewhat
on disk since then, probably due to a couple of added NumericDocValues
fields.  The java process is assigned about 3GB (the maximum, as it's
running on a 32 bit i686 Linux box), and it still goes OOM.

Any advice as to the possible cause and how to circumvent it would be great.
Here's the stack trace:

org.apache.lucene.index.MergePolicy$MergeException:
java.lang.OutOfMemoryError: Java heap space
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
Caused by: java.lang.OutOfMemoryError: Java heap space
org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:212)
org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:174)
org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:253)
org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:215)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)


Thanks,
Michael.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to