Hi,

Wow, that was fast - java-user support is just as fast as I heard! ;)
I'll try your patch shortly. Like I said, the bug may be in my application.  
Here is a clue.  Memory usage increases with the number of open files (file 
descriptors) on the system, and lsof gives:

COMMAND     PID    USER   FD      TYPE     DEVICE      SIZE       NODE NAME
...
java      24476   xxx *562r      REG      253,0    657314    8470761 
/xxx/users/1/1-home/link/index/_20k.cfs
java      24476   xxx *894r      REG      253,0    657314    8470761 
/xxx/users/1/1-home/link/index/_20k.cfs
java      24476   xxx *078r      REG      253,0    657314    8470761 
/xxx/users/1/1-home/link/index/_20k.cfs
java      24476   xxx *648r      REG      253,0    657314    8470761 
/xxx/users/1/1-home/link/index/_20k.cfs
...

If I'm reading this right, this tells me that this same file has been opened a 
number of different times (note the FD column, all file descriptors are 
different).  This must correspond to multiple new IndexSearcher(...) calls, no? 
 Multiple new IndexSearcher(...) calls on the same index are okay in my case - 
because the system has tens of thousands of separate indices, it can't keep all 
IndexSearchers open at all times, so I use the LRU algo to keep only recently 
used IndexSearchers open.  The other ones I "let go" without an explicit 
close() call.  The assumption is that the old IndexSearchers "expire", that 
they get garbage collected, as I'm no longer holding references to them.
Iff I understand this correctly, the fact that I see these open file 
descriptors all pointing to the same index file tells me that the old 
IndexSearchers are just hanging around and are not getting cleaned up.

I can also see the number of file descriptors increasing with time:

$ /usr/sbin/lsof | grep -c '/1-home/link/index/_20k.cfs'
14
$ /usr/sbin/lsof | grep -c '/1-home/link/index/_20k.cfs'
23

This may still not point to my app having the bug, but it points to something 
not releasing IndexSearcher/IndexReader, as that is not getting GCed as before. 
 I did not change my logic for creating new IndexSearchers (inlined in my 
previous email).  On the other hand, this app has recently started getting a 
lot more search action, so perhaps it's just that the GC is not cleaning things 
up fast enough....
I happen to have an lsof output from the same system from July.  I see the same 
thing there - a number of FDs open and pointing to the same .cfs index file.  
Perhaps it's just that the JVM GC was able to clean things up then, and now it 
can't, because the CPU is maxed out.... really maxed out.

Otis

----- Original Message ----
From: Michael McCandless <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, November 30, 2006 6:51:48 AM
Subject: Re: 2.1-dev memory leak?

Otis Gospodnetic wrote:
> Hi,
> 
> Is anyone running Lucene trunk/HEAD version in a serious production system?  
> Anyone noticed any memory leaks?
> 
> I'm asking because I recently bravely went from 1.9.1 to 2.1-dev (trunk from 
> about a week ago) and all of a sudden my application that was previosly 
> consuming about 1.5GB (-Xmx1500m) now consumes 2.2GB, and blows up after it 
> exhausts the whole heap and the GC can't make any more room there after 
> running for about 3-6 hours and handling several tens of thousands of queries.

Whoa, I'm sorry to hear this Otis :(

> I'd love to go back to 2.0.0, or even back to 1.9.1 and run that for a while 
> and just double-check that it really is the the Lucene upgrade that is the 
> source of the leak, but unfortunately because of LUCENE-701 (lockless 
> commits), I can't go back that easily without reindexing...
> 
> Moreover, I just looked at CHANGES.txt from 1.9.1 to present, and I think the 
> biggest change since then was LUCENE-701.

The file-format changes for lockless commits are small enough that
making a tool to back-convert a lockless format index into a
pre-lockless format index (so that Lucene 2.0 can read/write to it) is
fairly simple.

OK I coded up a first version.  I will open a JIRA issue and attach a
patch.

We clearly need to also get to the bottom of where the memory leak is,
but I think first priority is to stabilize your production
environment.  Hopefully this tool can at least get you back up in
production and then also enable us to narrow down where the memory
leak is.

Please tread carefully though: it makes me very nervous that this tool
I just created would be used in your production environment!
Obviously first test it in a sandbox, running against your production
index(es).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to