Something that did change at some point, can't remember when, was the way that discarded but not explicitly closed searchers/readers are handled. I think that they used to get garbage collected, causing open files to be closed, but now need to be explicitly closed. Sounds to me like you are opening new searchers/readers without closing old ones.
-- Ian. On Fri, Jan 6, 2012 at 6:50 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Can you show the code? In particular are you re-opening > the index writer? > > Bottom line: This isn't a problem anyone expects > in 3.1 absent some programming error on your > part, so it's hard to know what to say without > more information. > > 3.1 has other problems if you use spellcheck.collate, > you might want to upgrade if you use that feature > to at least 3.3. But I truly believe that this is irrelevant > to your problem. > > Best > Erick > > > On Fri, Jan 6, 2012 at 1:25 PM, Charlie Hubbard > <charlie.hubb...@gmail.com> wrote: >> Thanks for the reply. I'm still having trouble. I've made some changes to >> use commit over close, but I'm not seeing much in terms of changes on what >> seems like ever increasing open file handles. I'm developing on Mac OS X >> 10.6 and testing on Linux CentOS 4.5. My biggest problem is I can't tell >> why lsof is saying this process has this many open files. I'm seeing >> repeated files being opened more than once, and I'm seeing files showing up >> in lsof output that don't exist on the file system. For example here is >> the lucene directory: >> >> -rw-r--r-- 1 root root 328396 Jan 5 20:21 _ly.fdt >>> -rw-r--r-- 1 root root 6284 Jan 5 20:21 _ly.fdx >>> -rw-r--r-- 1 root root 2253 Jan 5 20:21 _ly.fnm >>> -rw-r--r-- 1 root root 234489 Jan 5 20:21 _ly.frq >>> -rw-r--r-- 1 root root 15704 Jan 5 20:21 _ly.nrm >>> -rw-r--r-- 1 root root 1113954 Jan 5 20:21 _ly.prx >>> -rw-r--r-- 1 root root 5421 Jan 5 20:21 _ly.tii >>> -rw-r--r-- 1 root root 445988 Jan 5 20:21 _ly.tis >>> -rw-r--r-- 1 root root 118262 Jan 6 09:56 _nx.cfs >>> -rw-r--r-- 1 root root 10009 Jan 6 10:00 _ny.cfs >>> -rw-r--r-- 1 root root 20 Jan 6 10:00 segments.gen >>> -rw-r--r-- 1 root root 716 Jan 6 10:00 segments_kw >> >> >> And here is an excerpt from: lsof -p 19422 | awk -- '{print $9}' | sort >> >> ... >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs >> ... >> >> As you can see none of those files actually exist. Not only that, but they >> are opened 8 or 9 times. There are tons of these non-existant repeatedly >> open files in the output. So why are the handles being counted as being >> open? >> >> I have a single IndexWriter and a single IndexSearcher open on a single CFS >> directory. The writer is only used by a single thread, but IndexSearcher >> can be shared among several threads. I still think something has changed >> in 3.1 that's causing this. I hope you can help me understand how it's not. >> >> Charlie >> >> On Mon, Jan 2, 2012 at 3:03 PM, Simon Willnauer < >> simon.willna...@googlemail.com> wrote: >> >>> hey charlie, >>> >>> there are a couple of wrong assumptions in your last email mostly >>> related to merging. mergefactor = 10 doesn't mean that you are ending >>> up with one file neither is it related to files. Yet, my first guess >>> is that you are using CompoundFileSystem (CFS) so each segment >>> corresponds to a single file. The merge factor relates to segments and >>> is responsible for triggering segment merges by their size (either in >>> bytes or in documents). For more details see this blog: >>> >>> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html >>> >>> If you are using CFS one segment is one file. In 3.1 CFS is only used >>> if the target segment is less than the nonCFSRatio. That prevents the >>> usage of CFS for segments that are bigger than a fraction of the >>> existing index to be packed into CFS (by default 0.1 -> 10%) >>> >>> this means your index might create non-cfs segments with multiple >>> files (10 in the worst case.... maybe I missed one but anyway...) >>> which means the number of open files increases. >>> >>> This is only a guess since I don't know what you are doing with your >>> index readers etc. Which platform are you one and what is the file >>> descriptor limit? In general its ok to raise the FD limit on your OS >>> and just let lucene do its job. if you are restricted in any way you >>> can set the LogMergePolicy#setNoCFSRatio(double) to 1.0 and see you >>> your are still seeing the problem. >>> >>> About commit vs. close - in general its not a good idea to close your >>> IW at all. I'd keep it open as long as you can and commit if needed. >>> Even optimize is somewhat overrated and should be used with care or >>> not at all... (here is another writeup regarding optimize: >>> >>> http://www.searchworkings.org/blog/-/blogs/simon-says%3A-optimize-is-bad-for-you >>> ) >>> >>> >>> hope that helps, >>> >>> simon >>> >>> >>> On Mon, Jan 2, 2012 at 5:38 PM, Charlie Hubbard >>> <charlie.hubb...@gmail.com> wrote: >>> > I'm beginning to think there is an issue with 3.1 that's causing this. >>> > After looking over my code again I forgot that the mechanism that does >>> the >>> > indexing hasn't changed, and the index IS being closed between cycles. >>> > Even when using push vs pull. This code used to work on 2.x lucene, >>> but I >>> > had to upgrade it. It had been very stable under 2.x, but after >>> upgrading >>> > to 3.1 I've started seeing this problem. I double checked the code doing >>> > the indexing, and it hasn't changed since I upgraded to 3.1. So the >>> > constant in this equation is mostly my code. What's different is 3.1. >>> > Furthermore, when new documents are pulled in through the >>> > old mechanism the open file count continues to rise. Over a 24 hours >>> > period it's grown by +296 files, but only 10 or 12 documents indexed. >>> > >>> > So is this a known issue? Should I upgrade to newer version to fix this? >>> > >>> > Thanks >>> > Charlie >>> > >>> > On Sat, Dec 31, 2011 at 1:01 AM, Charlie Hubbard >>> > <charlie.hubb...@gmail.com>wrote: >>> > >>> >> I have a program I recently converted from a pull scheme to a push >>> scheme. >>> >> So previously I was pulling down the documents I was indexing, and >>> when I >>> >> was done I'd close the IndexWriter at the end of each iteration. Now >>> that >>> >> I've converted to a push scheme I'm sent the documents to index, and I >>> >> write them. However, this means I'm not closing the IndexWriter since >>> >> closing after every document would have poor performance. Instead I'm >>> >> keeping the IndexWriter open all the time. Problem is after a while the >>> >> number of open files continues to rise. I've set the following >>> parameters >>> >> on the IndexWriter: >>> >> >>> >> merge.factor=10 >>> >> max.buffered.docs=1000 >>> >> >>> >> After going over the api docs I thought this would mean it'd never >>> create >>> >> more than 10 files before merging those files into a single file, but >>> it's >>> >> creating 100's of files. Since I'm not closing the IndexWriter will it >>> >> merge the files? From reading the API docs it sounded like merging >>> happens >>> >> regardless of flushing, commit, or close. Is that true? I've measured >>> the >>> >> files that are increasing, and it's files associated with this one index >>> >> I'm leaving open. I have another index that I do close periodically, >>> and >>> >> its not growing like this one. >>> >> >>> >> I've read some posts about using commit() instead of close() in >>> situations >>> >> like this because its faster performance. However, commit() just >>> flushes >>> >> to disk rather than flushing and optimizing like close(). Not sure >>> >> commit() is what I need or not. Any suggestions? >>> >> >>> >> Thanks >>> >> Charlie >>> >> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org