Thanks Paulo,

I actually do something very similar.  I have a queue of all pending
updates and a Thread that manages the queue.  When the queue gets about
100 big or is 30 seconds old (whatever comes sooner) I process it which
results in all the Index writes.  I also always optimize() and close()
the writer at the end of each queue process.  It doesn't seem to make
any difference how often I do this, before long the number of open files
is too many.  Creating a new FSDirectory and a new IndexWriter to use
each time I process the queue does not help either.

I will Otis' idea of using a compound index structure and report back.

Cheers,

Nick.

Paulo Silveira wrote:
> Nick!
>
> I had also the same problem. Now on my SearchEngine class, when I
> write a document to the index, I check if the number of documents mod
> 100 is 0. if it is, optimize().
>
> Optimize() reduces  the number of documents used by the index, so the
> number of open files also is reduced.
>
> Take a look:
>
>       private synchronized void write(Document document) throws IOException {
>               logger.debug("writing document");
>               IndexWriter writer = openWriter();
>               if (writer.docCount() % 100 == 0) {
>                       // avoiding too many open files, indexing 100 by 100.
>                       logger.info("optimizing indexes...");
>                       writer.optimize();
>               }
>               writer.addDocument(document);
>               writer.close();
>               reopenSearcher();
>               logger.debug("document wrote");
>       }
>
> I did not try to find a best value. 100 seems ok, although optimizing
> my indexes is already taking 2 seconds (and in a synchronized method
> this is not so good).
>
> Tell me what you think.
>
>
> On 3/16/06, Nick Atkins <[EMAIL PROTECTED]> wrote:
>   
>> Hi,
>>
>> What's the best way to manage the number of open files used by Lucene
>> when it's running under Tomcat?  I have a indexing application running
>> as a web app and I index a huge number of mail messages (upwards of
>> 40000 in some cases).  Lucene's merging routine always craps out
>> eventually with the "too many open files" regardless of how large I set
>> ulimit to.  lsof tells me they are all "deleted" but they still seem to
>> count as open files.  I don't want to set ulimit to some enormous value
>> just to solve this (because it will never be large enough).  What's the
>> best strategy here?
>>
>> I have tried setting various parameters on the IndexWriter such as the
>> MergeFactor, MaxMergeDocs and MaxBufferedDocs but they seem to only
>> affect the merge timing algorithm wrt memory usage.  The number of files
>> used seems to be unaffected by anything I can set on the IndexWriter.
>>
>> Any hints much appreciated.
>>
>> Cheers,
>>
>> Nick.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>     
>
>
> --
> Paulo E. A. Silveira
> Caelum Ensino e Soluções em Java
> http://www.caelum.com.br/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>   

Reply via email to