Re: IndexWriter memory leak?

Ruben Laguna Wed, 07 Apr 2010 15:24:13 -0700

I want to add that it tried this in both 2.9.0 and 3.0.1 and I got the same
"leaky" behavior.


See [3] for a screenshot of zzBuffer of 67MB on Lucene 3.0.1.

I managed to get rid of the Reader's "memory leak" by actually setting to
null the pointer to the actual Tika's Reader in my wrapper when the wrapper
is closed. But I still think that I would be nicer if IndexWriter wouldn't
maintain references to the Readers after indexing.

[3] http://img.skitch.com/20100407-ntn2kg13fx49wx4q118bp9h1hb.jpg


On Wed, Apr 7, 2010 at 10:35 PM, Ruben Laguna <ruben.lag...@gmail.com>wrote:

> Hi,
>
> It seems like my IndexWriter after commiting and optimizing has a retained
> size of 140Mb. See [1] for a screenshot of the heapdump analysis done with
> Eclipse MAT.
>
> Of those 140MB 67MB are retained by
> analyzer.tokenStreams.hardRefs.table.HashMap$Entry.value.tokenStream.scanner.zzBuffer
>
>
> why is this? Is it a memory leak? or did I something wrong during the
> indxing? (BTW, I'm indexing document which contains Fields(xxxx,Reader) and
> those Reader are wrappers around Tika.parse(xxxx) Readers. I get a lot
> IOExceptions from tika readers and the wrapper maps the exceptions to EOF so
> Lucene doesn't see the exception).
>
>
>
> ...and 73MB of the 140MB are retained by docWriter see [2]. It looks like
> the Field objects in the
> array docWriter.threadStates[0].consumer.fieldHash[1].fields[xxxx] are
> holding references to the Readers. Those reader instances are actually
> closed after IndexWriter.updateDocument. Each one of those Readers retains
> 1MB. The question is why IndexWriter holds references to those Readers after
> the Documents have been indexed.
>
>
> [1] http://img.skitch.com/20100407-1183815yiausisg73u9wfgscsj.jpg
> [2] http://img.skitch.com/20100407-b86irkp7e4uif2wq1dd4t899qb.jpg
>
> --
> /Rubén
>



-- 
/Rubén

Re: IndexWriter memory leak?

Reply via email to