Run this:

    svn co https://svn.apache.org/repos/asf/lucene/java/branches/lucene_2_9
lucene.29x

Then apply the patch, then, run "ant jar-core", and in that should
create the lucene-core-2.9.2-dev.jar.

Mike

On Wed, Apr 14, 2010 at 1:28 PM, Woolf, Ross <ross_wo...@bmc.com> wrote:
> How do I get to the 2.9.x branch?  Every link I take from the Lucene site 
> takes me to the trunk which I assume is the 3.x version.  I've tried to look 
> around svn but can't find anything labeled 2.9.x.  Is there a daily build of 
> 2.9.x or do I need to build it myself.  I would like to try out the fix you 
> put into it, but I'm not sure where I get it from.
>
> -----Original Message-----
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Wednesday, April 14, 2010 4:12 AM
> To: java-user@lucene.apache.org
> Subject: Re: IndexWriter and memory usage
>
> It looks like the mailing list software stripped your image attachments...
>
> Alas these fixes are only committed on 3.1.
>
> But I just posted the patch on LUCENE-2387 for 2.9.x -- it's a tiny
> fix.  I think the other issue was part of LUCENE-2074 (though this
> issue included many other changes) -- Uwe can you peel out just a
> 2.9.x patch for resetting JFlex's zzBuffer?
>
> You could also try switching analyzers (eg to WhitespaceAnalyzer) to
> see if in fact LUCENE-2074 (which affects StandandAnalyzer, since it
> uses JFlex) is [part of] your problem.
>
> Mike
>
> On Tue, Apr 13, 2010 at 6:42 PM, Woolf, Ross <ross_wo...@bmc.com> wrote:
>> Since the heap dump was so big and can't be attached, I have taken a few 
>> screen shots from Java VisualVM of the heap dump.  In the first image you 
>> can see that at the time our memory has become very tight most of it is held 
>> up in bytes.  In the second image I examine one of those instances and 
>> navigate to the nearest garbage collection root.  In looking at very many of 
>> these objects, they all end up being instantiated through the IndexWriter 
>> process.
>>
>> This heap dump is the same one correlating to the infoStream that was 
>> attached in a prior message.  So while the infoStream shows the buffer being 
>> flushed, what we experience is that our memory gets consumed over time by 
>> these bytes in the IndexWriter.
>>
>> I wanted to provide these images to see if they might correlate to the fixes 
>> mentioned below.  Hopefully those fixes mentioned below have rectified this 
>> problem.  And as I state in the prior message, I'm hoping these fixes are in 
>> a 2.9x branch and hoping for someone to point me to where I can get those 
>> fixes to try out.
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Woolf, Ross [mailto:ross_wo...@bmc.com]
>> Sent: Tuesday, April 13, 2010 1:29 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: IndexWriter and memory usage
>>
>> Are these fixes in 2.9x branch?  We are using 2.9x and can't move to 3x just 
>> yet.  If so, where do I specifically pick this up from?
>>
>> -----Original Message-----
>> From: Lance Norskog [mailto:goks...@gmail.com]
>> Sent: Monday, April 12, 2010 10:20 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: IndexWriter and memory usage
>>
>> There is some bugs where the writer data structures retain data after
>> it is flushed. They are committed as of maybe the past week. If you
>> can pull the trunk and try it with your use case, that would be great.
>>
>> On Mon, Apr 12, 2010 at 8:54 AM, Woolf, Ross <ross_wo...@bmc.com> wrote:
>>> I was on vacation last week so just getting back to this...  Here is the 
>>> info stream (as an attachment).  I'll see what I can do about reducing the 
>>> heap dump (It was supplied by a colleague).
>>>
>>>
>>> -----Original Message-----
>>> From: Michael McCandless [mailto:luc...@mikemccandless.com]
>>> Sent: Saturday, April 03, 2010 3:39 AM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: IndexWriter and memory usage
>>>
>>> Hmm why is the heap dump so immense?  Normally it contains the top N
>>> (eg 100) object types and their count/aggregate RAM usage.
>>>
>>> Can you attach the infoStream output to an email (to java-user)?
>>>
>>> Mike
>>>
>>> On Fri, Apr 2, 2010 at 5:28 PM, Woolf, Ross <ross_wo...@bmc.com> wrote:
>>>> I have this and the heap dump is 63mb zipped.  The info stream is much 
>>>> smaller (31 kb zipped), but I don't know how to get them to you.
>>>>
>>>> We are not using the NRT readers
>>>>
>>>> -----Original Message-----
>>>> From: Michael McCandless [mailto:luc...@mikemccandless.com]
>>>> Sent: Thursday, April 01, 2010 5:21 PM
>>>> To: java-user@lucene.apache.org
>>>> Subject: Re: IndexWriter and memory usage
>>>>
>>>> Hmm, not good.  Can you post a heap dump?  Also, can you turn on
>>>> infoStream, index up to the OOM @ 512 MB, and post the output?
>>>>
>>>> IndexWriter should not hang onto much beyond the RAM buffer.  But, it
>>>> does allocate and then recycle this RAM buffer, so even in an idle
>>>> state (having indexed enough docs to fill up the RAM buffer at least
>>>> once) it'll hold onto those 16 MB.
>>>>
>>>> Are you using getReader (to get your NRT readers)?  If so, are you
>>>> really sure you're eventually closing the previous reader after
>>>> opening a new one?
>>>>
>>>> Mike
>>>>
>>>> On Thu, Apr 1, 2010 at 6:58 PM, Woolf, Ross <ross_wo...@bmc.com> wrote:
>>>>> We are seeing a situation where the IndexWriter is using up the Java Heap 
>>>>> space and only releases memory for garbage collection upon a commit.   We 
>>>>> are using the default RAMBufferSize of 16 mb.  We are using Lucene 2.9.1. 
>>>>> We are set at heap size of 512 mb.
>>>>>
>>>>> We have a large number of documents that are run through Tika and then 
>>>>> added to the index.  The data from Tika is changed to a string, and then 
>>>>> sent to Lucene.  Heap dumps clearly show the data in the Lucene classes 
>>>>> and not in Tika.  Our intent is to only perform a commit once the entire 
>>>>> indexing run is complete, but several hours into the process everything 
>>>>> comes to a crawl.  In using both JConsole and VisualVM  we can see that 
>>>>> the heap space is maxed out and garbage collection is not able to clean 
>>>>> up any memory once we get into this state.  It is our understanding that 
>>>>> the IndexWriter should be only holding onto 16 mb of data before it 
>>>>> flushes it, but what we are seeing is that while it is in fact writing 
>>>>> data to disk when it hits the 16 mb limit, it is also holding onto some 
>>>>> data in memory and not allowing garbage collection to take place, and 
>>>>> this continues until garbage collection is unable to free up enough space 
>>>>> to all things to move faster than a crawl.
>>>>>
>>>>> As a test we caused a commit to occur after each document is indexed and 
>>>>> we see the total amount of memory reduced from nearly 100% of the Java 
>>>>> Heap to around 70-75%.  The profiling tools now show that the memory is 
>>>>> cleaned up to some extent after each document.  But of course this 
>>>>> completely defeats the whole reason why we want to only commit at the end 
>>>>> of the run for performance sake.  Most of the data, as seen using Heap 
>>>>> analasis, is held in Byte, Character, and Integer classes whos GC roots 
>>>>> are tied back to the Writer Objects and threads.  The instance counts, 
>>>>> after running just 1,100 documents seems staggering
>>>>>
>>>>> Is there additional data that the IndexWriter hangs onto regardless of 
>>>>> when it hits the RAMBufferSize limit?  Why are we seeing the heap space 
>>>>> all being used up?
>>>>>
>>>>> A side question to this is the fact that we always see a large amount of 
>>>>> memory used by the IndexWriter even after our indexing has been completed 
>>>>> and all commits have taken place (basically in an idle state).  Why would 
>>>>> this be?  Is the only way to totally clean up the memory is to close the 
>>>>> writer?  Our index is also used for real time indexing so the IndexWriter 
>>>>> is intended to remain open for the lifetime of the app.
>>>>>
>>>>> Any help in understanding why the IndexWriter is maxing out our heap 
>>>>> space or what is expected from memory usage of the IndexWriter would be 
>>>>> appreciated.
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to