> [Toke: No frequent updates]
> 
> So everything is rebuild from scratch each time? Or do you mean that you're
> only adding new documents, not changing old ones?

Everything is reindexed from scratch - indexing speed is not essential to us...

> Either way, optimizing to a single 140GB segment is heavy. Ignoring the
> relatively light processing of the data, the I/O for merging is still at the 
> very
> minimum to read and write the 140GB. Even if you can read and write
> 100MB/sec it still takes an hour. This is of course not that relevant if 
> you're
> fine with a nightly batch job.

Sorry - I wasn't clear here. The total index size ends up being 140GB but to 
try to help improve performance we build 50 separate indexes (which end up 
being a bit under 3gb each) and then open them with a parallel multisearcher. 
The only reason I tried this multisearcher approach was to toy around with 
Katta which ended up not working out for us. I can also deploy it as a 
RemoteSearchable (although I'm not sure if this is deprecated or not).
 
> > By more segments do you mean not call optimize() at index time?
> 
> Either that or calling it with maxNumSegments 10, where 10 is just a wild
> guess. Your mileage will vary:
> http://lucene.apache.org/java/3_0_0/api/all/org/apache/lucene/index/Inde
> xWriter.html#optimize%28int%29

Is preferred(in terms of performance) to the above approach (splitting into 
multiple indexes)?

> As Erick Erickson recently wrote: "Since it doesn't make sense to me, that
> must mean I don't understand the problem very thoroughly".

Not yet! I've added some benchmarking code to keep track of all performance as 
I add these changes. Do you happen to know if the Lucene benchmark package is 
still in use / a good thing to toy around with?

Thanks for all your suggestions,
-Chris

Reply via email to