> [Toke: No frequent updates] > > So everything is rebuild from scratch each time? Or do you mean that you're > only adding new documents, not changing old ones?
Everything is reindexed from scratch - indexing speed is not essential to us... > Either way, optimizing to a single 140GB segment is heavy. Ignoring the > relatively light processing of the data, the I/O for merging is still at the > very > minimum to read and write the 140GB. Even if you can read and write > 100MB/sec it still takes an hour. This is of course not that relevant if > you're > fine with a nightly batch job. Sorry - I wasn't clear here. The total index size ends up being 140GB but to try to help improve performance we build 50 separate indexes (which end up being a bit under 3gb each) and then open them with a parallel multisearcher. The only reason I tried this multisearcher approach was to toy around with Katta which ended up not working out for us. I can also deploy it as a RemoteSearchable (although I'm not sure if this is deprecated or not). > > By more segments do you mean not call optimize() at index time? > > Either that or calling it with maxNumSegments 10, where 10 is just a wild > guess. Your mileage will vary: > http://lucene.apache.org/java/3_0_0/api/all/org/apache/lucene/index/Inde > xWriter.html#optimize%28int%29 Is preferred(in terms of performance) to the above approach (splitting into multiple indexes)? > As Erick Erickson recently wrote: "Since it doesn't make sense to me, that > must mean I don't understand the problem very thoroughly". Not yet! I've added some benchmarking code to keep track of all performance as I add these changes. Do you happen to know if the Lucene benchmark package is still in use / a good thing to toy around with? Thanks for all your suggestions, -Chris