Justus Pendleton-2 wrote: > > 1. Why does the merge factor of 4 appear to be faster than the merge > factor of 2? > > 2. Why does non-optimized searching appear to be faster than optimized > searching once the index hits ~500,000 documents? > > 3. There appears to be a fairly sizable performance drop across the > board around 450,000 documents. Why is that? >
Hi Justus, 1. Higher merge factor => more segments. Lucene (which version are you using, by the way?) only keeps a single file handle per physical file per index reader; if your benchmark is multi-threaded, more concurrently active segments would mean more file handles. Since you're using an 8-core Mac Pro I also assume you have some sort of RAID setup, which means your storage subsystem can physically handle more than one concurrent request, which can only come into play with multiple segments. 2. Same explanation as above - an optimized index has only one segment, and contention on the file handle can actually becomes a bottleneck past a certain threshold. A merge factor of 2 leaves you with very few segments even for a non-optimized index, which is why the performance of a non-optimized, 2-factor index is very close to that of the optimized index. The optimal merge-factor in this case will probably be a function of the complexity of your RAID setup (NAS devices can easily utilize dozens of physical drives, giving a measurable benefit to multiple concurrently active segments), but I expect your setup won't seriously benefit from an increase in the merge factor because it probably uses 4 or less physical drives. 3. This is trickier; my guess is that until that point most of the term-frequency data (.frq) is small enough to be kept fully in the disk read cache, and beyond that point considerably more I/O is actually performed by the storage subsystem. This can be probably be measured with tools available in the OS of your choice, if you wish to corroborate this theory (I'd certainly be interested in the results). Best of luck, Tomer ----- -- http://www.tomergabel.com Tomer Gabel -- View this message in context: http://www.nabble.com/Performance-of-never-optimizing-tp20296914p20343051.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]