Thanks Paulo, I actually do something very similar. I have a queue of all pending updates and a Thread that manages the queue. When the queue gets about 100 big or is 30 seconds old (whatever comes sooner) I process it which results in all the Index writes. I also always optimize() and close() the writer at the end of each queue process. It doesn't seem to make any difference how often I do this, before long the number of open files is too many. Creating a new FSDirectory and a new IndexWriter to use each time I process the queue does not help either.
I will Otis' idea of using a compound index structure and report back. Cheers, Nick. Paulo Silveira wrote: > Nick! > > I had also the same problem. Now on my SearchEngine class, when I > write a document to the index, I check if the number of documents mod > 100 is 0. if it is, optimize(). > > Optimize() reduces the number of documents used by the index, so the > number of open files also is reduced. > > Take a look: > > private synchronized void write(Document document) throws IOException { > logger.debug("writing document"); > IndexWriter writer = openWriter(); > if (writer.docCount() % 100 == 0) { > // avoiding too many open files, indexing 100 by 100. > logger.info("optimizing indexes..."); > writer.optimize(); > } > writer.addDocument(document); > writer.close(); > reopenSearcher(); > logger.debug("document wrote"); > } > > I did not try to find a best value. 100 seems ok, although optimizing > my indexes is already taking 2 seconds (and in a synchronized method > this is not so good). > > Tell me what you think. > > > On 3/16/06, Nick Atkins <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> What's the best way to manage the number of open files used by Lucene >> when it's running under Tomcat? I have a indexing application running >> as a web app and I index a huge number of mail messages (upwards of >> 40000 in some cases). Lucene's merging routine always craps out >> eventually with the "too many open files" regardless of how large I set >> ulimit to. lsof tells me they are all "deleted" but they still seem to >> count as open files. I don't want to set ulimit to some enormous value >> just to solve this (because it will never be large enough). What's the >> best strategy here? >> >> I have tried setting various parameters on the IndexWriter such as the >> MergeFactor, MaxMergeDocs and MaxBufferedDocs but they seem to only >> affect the merge timing algorithm wrt memory usage. The number of files >> used seems to be unaffected by anything I can set on the IndexWriter. >> >> Any hints much appreciated. >> >> Cheers, >> >> Nick. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> > > > -- > Paulo E. A. Silveira > Caelum Ensino e Soluções em Java > http://www.caelum.com.br/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >