Hi,

About 2-3 weeks ago I emailed about a memory leak in my application.  I then 
found some problems in my code (I wasn't closing IndexSearchers explicitly) and 
took care of those.  Now I see my app is still leaking memory - jconsole 
clearly shows the "Tenured Gen" memory pool getting filled up until I hit the 
OOM, but I can't seem to pin-point the source.

I found that a bunch or o.a.l.index.* objects are not getting GCed, even though 
they should.  For example:

$ jmap -histo:live 7825 | grep apache.lucene.index | head -20 | sort -k2 -nr
num   #instances    #bytes  class name
--------------------------------------
  4:   1764840    98831040  
org.apache.lucene.index.CompoundFileReader$CSIndexInput
  5:   2119215    67814880  org.apache.lucene.index.TermInfo
  7:   1112459    35598688  org.apache.lucene.index.SegmentReader$Norm
  9:   2132311    34116976  org.apache.lucene.index.Term
 12:   1117897    26829528  org.apache.lucene.index.FieldInfo
 13:    225340    18027200  org.apache.lucene.index.SegmentTermEnum
 15:    589727    14153448  org.apache.lucene.index.TermBuffer
 21:     86033     8718504  [Lorg.apache.lucene.index.TermInfo;
 20:     86033     8718504  [Lorg.apache.lucene.index.Term;
 23:     86120     7578560  org.apache.lucene.index.SegmentReader
 26:     90501     5068056  org.apache.lucene.store.FSIndexInput
 27:     86120     4822720  org.apache.lucene.index.TermInfosReader
 33:     86130     3445200  org.apache.lucene.index.SegmentInfo
 36:     87355     2795360  org.apache.lucene.store.FSIndexInput$Descriptor
 38:     86120     2755840  org.apache.lucene.index.FieldsReader
 39:     86050     2753600  org.apache.lucene.index.CompoundFileReader
 42:     46903     2251344  org.apache.lucene.index.SegmentInfos
 43:     93778     2250672  org.apache.lucene.search.FieldCacheImpl$Entry
 45:     93778     1500448  
org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder
 47:     86510     1384160  org.apache.lucene.index.FieldInfos

I'm running my app in search-only mode - no adds or deletes.
The counts of these objects just keeps going up, even though I am explicitly 
closing the IndexSearcher.  I can see that file descriptors _are_ freed up 
after searcher.close(), because lsof no longer shows them, but the above 
objects just linger and accumulate, even when I force GC via jconsole or via 
the profiler.

I thought maybe various *Readers are not getting close()d, but I've 
double-checked all *Readers above, and they all seem to close their IndexInput 
references.  The static nested class CompoundFileReader.CSIndexInput has a 
close() without any implementation.  At first I thought that was an omission, 
but adding a close of the inner IndexInput there resulted in a search-time 
error.  I've added the lovely print debugging to various close() methods and 
see those methods being called.  I've added finalize() with some print 
debugging to SegmentReader, TermInfosReader, SegmentTermEnum, FieldsReader, and 
CompoundFileReader.  All but CFReader get finalized after a while.

My application is running as a webapp and has thousands of separate indices.  
This means it's very multi-threaded and the servlet container has a pool of 
threads that handle requests, and each request may be for a different index.  I 
cache IndexSearchers for a while, and purge/close them every N minutes if they 
have been idle more than M minutes.
It occurred to me last night that things like TermInfosReader and SegmentReader 
are using ThreadLocal, and since threads are used in a thread pool, and thus 
shared with requests handling searches against different indices, it's not 
clear to me what happens with object instances that are put in those 
ThreadLocals in such scenario.  Aren't things going to step on each others' 
toes?
TIR has close() and SR has doClose(), so I put <TL inst>.set(null) there.  This 
immediately got rid of those instances of CompoundFileReader.CSIndexInput in my 
dev environment!!!! Yeeees!

But in my dev environment I tested my additions by slamming my app against a 
*single* index.  I took my modified Lucene to production, and quickly saw all 
those o.a.l.index.* objects accumulate again.  I also see a lot of 
ThreadLocal's kids:

 16:    419387    13420384  java.lang.ThreadLocal$ThreadLocalMap$Entry

I *think* that points out to some issues with how that ThreadLocal is used 
there, in a multi-threaded, multi-index environments.
I'm running JDK 6, and while this problem sounds a bit like LUCENE-436, I'm not 
yet sure if it's the same thing.  
Because my IndexSearchers (and thus all those o.a.l.index.* objects) are 
long-lived, and threads are shared and reused for searching of other indices, 
those close() and doClose() methods are not called at the end of the request 
life-cycle, so at the end of the request those TL instances will *still* have 
something in them.  When their thread is later reused for searching of another 
index, new data will be put in them, but the old data will never be cleaned 
out!  No?
It seems a bit odd, but with this ThreadLocals, shouldn't a multi-threaded, 
multi-index webapp really have to "clean" those ThreadLocal instances either 
before or at the end of the request?

I'm running out of ideas, and was wondering if anyone has any thoughts about 
what could still be holding references to the above classes.  I have some 
20-30MB memory snapshots (via YourKit) and heap dumps (via jmap), if anyone is 
interested.

Thanks,
Otis






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to