I want to add that it tried this in both 2.9.0 and 3.0.1 and I got the same "leaky" behavior.
See [3] for a screenshot of zzBuffer of 67MB on Lucene 3.0.1. I managed to get rid of the Reader's "memory leak" by actually setting to null the pointer to the actual Tika's Reader in my wrapper when the wrapper is closed. But I still think that I would be nicer if IndexWriter wouldn't maintain references to the Readers after indexing. [3] http://img.skitch.com/20100407-ntn2kg13fx49wx4q118bp9h1hb.jpg On Wed, Apr 7, 2010 at 10:35 PM, Ruben Laguna <ruben.lag...@gmail.com>wrote: > Hi, > > It seems like my IndexWriter after commiting and optimizing has a retained > size of 140Mb. See [1] for a screenshot of the heapdump analysis done with > Eclipse MAT. > > Of those 140MB 67MB are retained by > analyzer.tokenStreams.hardRefs.table.HashMap$Entry.value.tokenStream.scanner.zzBuffer > > > why is this? Is it a memory leak? or did I something wrong during the > indxing? (BTW, I'm indexing document which contains Fields(xxxx,Reader) and > those Reader are wrappers around Tika.parse(xxxx) Readers. I get a lot > IOExceptions from tika readers and the wrapper maps the exceptions to EOF so > Lucene doesn't see the exception). > > > > ...and 73MB of the 140MB are retained by docWriter see [2]. It looks like > the Field objects in the > array docWriter.threadStates[0].consumer.fieldHash[1].fields[xxxx] are > holding references to the Readers. Those reader instances are actually > closed after IndexWriter.updateDocument. Each one of those Readers retains > 1MB. The question is why IndexWriter holds references to those Readers after > the Documents have been indexed. > > > [1] http://img.skitch.com/20100407-1183815yiausisg73u9wfgscsj.jpg > [2] http://img.skitch.com/20100407-b86irkp7e4uif2wq1dd4t899qb.jpg > > -- > /Rubén > -- /Rubén