This would be a very good thing to try, given that you have some huge documents that, indexed alone, use far more than your RAM buffer.
Mike On Tue, Apr 13, 2010 at 12:19 AM, Lance Norskog <goks...@gmail.com> wrote: > There is some bugs where the writer data structures retain data after > it is flushed. They are committed as of maybe the past week. If you > can pull the trunk and try it with your use case, that would be great. > > On Mon, Apr 12, 2010 at 8:54 AM, Woolf, Ross <ross_wo...@bmc.com> wrote: >> I was on vacation last week so just getting back to this... Here is the >> info stream (as an attachment). I'll see what I can do about reducing the >> heap dump (It was supplied by a colleague). >> >> >> -----Original Message----- >> From: Michael McCandless [mailto:luc...@mikemccandless.com] >> Sent: Saturday, April 03, 2010 3:39 AM >> To: java-user@lucene.apache.org >> Subject: Re: IndexWriter and memory usage >> >> Hmm why is the heap dump so immense? Normally it contains the top N >> (eg 100) object types and their count/aggregate RAM usage. >> >> Can you attach the infoStream output to an email (to java-user)? >> >> Mike >> >> On Fri, Apr 2, 2010 at 5:28 PM, Woolf, Ross <ross_wo...@bmc.com> wrote: >>> I have this and the heap dump is 63mb zipped. The info stream is much >>> smaller (31 kb zipped), but I don't know how to get them to you. >>> >>> We are not using the NRT readers >>> >>> -----Original Message----- >>> From: Michael McCandless [mailto:luc...@mikemccandless.com] >>> Sent: Thursday, April 01, 2010 5:21 PM >>> To: java-user@lucene.apache.org >>> Subject: Re: IndexWriter and memory usage >>> >>> Hmm, not good. Can you post a heap dump? Also, can you turn on >>> infoStream, index up to the OOM @ 512 MB, and post the output? >>> >>> IndexWriter should not hang onto much beyond the RAM buffer. But, it >>> does allocate and then recycle this RAM buffer, so even in an idle >>> state (having indexed enough docs to fill up the RAM buffer at least >>> once) it'll hold onto those 16 MB. >>> >>> Are you using getReader (to get your NRT readers)? If so, are you >>> really sure you're eventually closing the previous reader after >>> opening a new one? >>> >>> Mike >>> >>> On Thu, Apr 1, 2010 at 6:58 PM, Woolf, Ross <ross_wo...@bmc.com> wrote: >>>> We are seeing a situation where the IndexWriter is using up the Java Heap >>>> space and only releases memory for garbage collection upon a commit. We >>>> are using the default RAMBufferSize of 16 mb. We are using Lucene 2.9.1. >>>> We are set at heap size of 512 mb. >>>> >>>> We have a large number of documents that are run through Tika and then >>>> added to the index. The data from Tika is changed to a string, and then >>>> sent to Lucene. Heap dumps clearly show the data in the Lucene classes >>>> and not in Tika. Our intent is to only perform a commit once the entire >>>> indexing run is complete, but several hours into the process everything >>>> comes to a crawl. In using both JConsole and VisualVM we can see that >>>> the heap space is maxed out and garbage collection is not able to clean up >>>> any memory once we get into this state. It is our understanding that the >>>> IndexWriter should be only holding onto 16 mb of data before it flushes >>>> it, but what we are seeing is that while it is in fact writing data to >>>> disk when it hits the 16 mb limit, it is also holding onto some data in >>>> memory and not allowing garbage collection to take place, and this >>>> continues until garbage collection is unable to free up enough space to >>>> all things to move faster than a crawl. >>>> >>>> As a test we caused a commit to occur after each document is indexed and >>>> we see the total amount of memory reduced from nearly 100% of the Java >>>> Heap to around 70-75%. The profiling tools now show that the memory is >>>> cleaned up to some extent after each document. But of course this >>>> completely defeats the whole reason why we want to only commit at the end >>>> of the run for performance sake. Most of the data, as seen using Heap >>>> analasis, is held in Byte, Character, and Integer classes whos GC roots >>>> are tied back to the Writer Objects and threads. The instance counts, >>>> after running just 1,100 documents seems staggering >>>> >>>> Is there additional data that the IndexWriter hangs onto regardless of >>>> when it hits the RAMBufferSize limit? Why are we seeing the heap space >>>> all being used up? >>>> >>>> A side question to this is the fact that we always see a large amount of >>>> memory used by the IndexWriter even after our indexing has been completed >>>> and all commits have taken place (basically in an idle state). Why would >>>> this be? Is the only way to totally clean up the memory is to close the >>>> writer? Our index is also used for real time indexing so the IndexWriter >>>> is intended to remain open for the lifetime of the app. >>>> >>>> Any help in understanding why the IndexWriter is maxing out our heap space >>>> or what is expected from memory usage of the IndexWriter would be >>>> appreciated. >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > > -- > Lance Norskog > goks...@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org