I have this and the heap dump is 63mb zipped.  The info stream is much smaller 
(31 kb zipped), but I don't know how to get them to you.  

We are not using the NRT readers

-----Original Message-----
From: Michael McCandless [mailto:luc...@mikemccandless.com] 
Sent: Thursday, April 01, 2010 5:21 PM
To: java-user@lucene.apache.org
Subject: Re: IndexWriter and memory usage

Hmm, not good.  Can you post a heap dump?  Also, can you turn on
infoStream, index up to the OOM @ 512 MB, and post the output?

IndexWriter should not hang onto much beyond the RAM buffer.  But, it
does allocate and then recycle this RAM buffer, so even in an idle
state (having indexed enough docs to fill up the RAM buffer at least
once) it'll hold onto those 16 MB.

Are you using getReader (to get your NRT readers)?  If so, are you
really sure you're eventually closing the previous reader after
opening a new one?

Mike

On Thu, Apr 1, 2010 at 6:58 PM, Woolf, Ross <ross_wo...@bmc.com> wrote:
> We are seeing a situation where the IndexWriter is using up the Java Heap 
> space and only releases memory for garbage collection upon a commit.   We are 
> using the default RAMBufferSize of 16 mb.  We are using Lucene 2.9.1. We are 
> set at heap size of 512 mb.
>
> We have a large number of documents that are run through Tika and then added 
> to the index.  The data from Tika is changed to a string, and then sent to 
> Lucene.  Heap dumps clearly show the data in the Lucene classes and not in 
> Tika.  Our intent is to only perform a commit once the entire indexing run is 
> complete, but several hours into the process everything comes to a crawl.  In 
> using both JConsole and VisualVM  we can see that the heap space is maxed out 
> and garbage collection is not able to clean up any memory once we get into 
> this state.  It is our understanding that the IndexWriter should be only 
> holding onto 16 mb of data before it flushes it, but what we are seeing is 
> that while it is in fact writing data to disk when it hits the 16 mb limit, 
> it is also holding onto some data in memory and not allowing garbage 
> collection to take place, and this continues until garbage collection is 
> unable to free up enough space to all things to move faster than a crawl.
>
> As a test we caused a commit to occur after each document is indexed and we 
> see the total amount of memory reduced from nearly 100% of the Java Heap to 
> around 70-75%.  The profiling tools now show that the memory is cleaned up to 
> some extent after each document.  But of course this completely defeats the 
> whole reason why we want to only commit at the end of the run for performance 
> sake.  Most of the data, as seen using Heap analasis, is held in Byte, 
> Character, and Integer classes whos GC roots are tied back to the Writer 
> Objects and threads.  The instance counts, after running just 1,100 documents 
> seems staggering
>
> Is there additional data that the IndexWriter hangs onto regardless of when 
> it hits the RAMBufferSize limit?  Why are we seeing the heap space all being 
> used up?
>
> A side question to this is the fact that we always see a large amount of 
> memory used by the IndexWriter even after our indexing has been completed and 
> all commits have taken place (basically in an idle state).  Why would this 
> be?  Is the only way to totally clean up the memory is to close the writer?  
> Our index is also used for real time indexing so the IndexWriter is intended 
> to remain open for the lifetime of the app.
>
> Any help in understanding why the IndexWriter is maxing out our heap space or 
> what is expected from memory usage of the IndexWriter would be appreciated.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to