Found a mistake in my reponse...when I was talking about max merge docs, I meant max buffered docs. If your going to optimize anyway, the key setting appears to be max buffered docs, and I have yet to see the merge factor affect anything (again, only if you optimize). Oddly, performance seems to decrease as you up max buffered docs far before you are even close to running out of available ram. I do not know why this is, but you should certainly test to see what your prime settings are.

Also, the knew benchmarking stuff is awesome.

- Mark

Grant Ingersoll wrote:
You may find contrib/Benchmark useful in your testing. Doron Cohen has added a nice framework for scripting benchmarking tests.

-Grant

On Feb 11, 2007, at 12:14 PM, Mark Miller wrote:

Not sensible at all. First, a merge above something like 90 most likely never makes since. Second, I have done some testing and my results show that if you optimize the index after loading, the merge factor really doesn't matter so keep it at 10 (I never used a max merge docs below 50. 100 worked best, 1,000 and 2,000 slowed things down even though the test had access to 600MB RAM and the docs where around 10-20k each). Setting up a test harness that automatically indexes a good amount of docs (I did 20,000) with a variety of settings will tell you a lot. Things will obviously bend based on your setup.

- Mark

maureen tanuwidjaja wrote:
Hi all,
I just wondering wheter is it sensible and possible if I have 660,000 documents to be indexed,I set the merge factor to 660,000 instead of the default value 10 (...and this means no merge while indexing) and later after closing the index,I use the IndexWriter to optimize/merge the whole index file...
      Thanks and Regards,
  Maureen
   ---------------------------------
We won't tell. Get more on shows you hate to love
(and love to hate): Yahoo! TV's Guilty Pleasures list.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to