May I suggest:
Don't call optimize. You don't need it. Here is my approach:
Keep each one of your 250,000 document indexes separate - so run your
batch, build the index, and then just close it. Don't try to optimize
it. For each 250,000 document batch, just put it into a different folder.
Now, when you have finished building your entire index, you will have a
bunch of different unoptimized lucene indexes. Open up a new, blank
index, and merge all of your other indexes into this one. The end
result will be a single large (already optimized) index.
This approach has several benefits -
You can keep the parameters set in such a way that it performs better
while indexing (without running into the out of file handles issues)
If a failure occurs, you only have to redo the batch, not start over the
entire process.
You don't have unnecessary IO, but constantly rewriting your data with
optimize() calls.
You can very easily break up the indexing across multiple machines.
If a failure occurs while trying to merge all of the indexes together,
you don't lose anything - as you are only reading the existing indexes.
You know they will all still be valid.
I actually wrote a wrapper for Lucene that does all of this under the
covers. At some point, I should get it released open source :)
Dan
--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]