Re: Any problems with a failed IndexWriter optimize call?

Dan Armbrust Mon, 01 Aug 2005 13:34:15 -0700

May I suggest:

Don't call optimize. You don't need it. Here is my approach:Keep each one of your 250,000 document indexes separate - so run yourbatch, build the index, and then just close it. Don't try to optimizeit. For each 250,000 document batch, just put it into a different folder.

Now, when you have finished building your entire index, you will have abunch of different unoptimized lucene indexes. Open up a new, blankindex, and merge all of your other indexes into this one. The endresult will be a single large (already optimized) index.



This approach has several benefits -

You can keep the parameters set in such a way that it performs betterwhile indexing (without running into the out of file handles issues)If a failure occurs, you only have to redo the batch, not start over theentire process.You don't have unnecessary IO, but constantly rewriting your data withoptimize() calls.

You can very easily break up the indexing across multiple machines.

If a failure occurs while trying to merge all of the indexes together,you don't lose anything - as you are only reading the existing indexes.You know they will all still be valid.

I actually wrote a wrapper for Lucene that does all of this under thecovers. At some point, I should get it released open source :)


Dan

--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Any problems with a failed IndexWriter optimize call?

Reply via email to