Found a mistake in my reponse...when I was talking about max merge docs,
I meant max buffered docs. If your going to optimize anyway, the key
setting appears to be max buffered docs, and I have yet to see the merge
factor affect anything (again, only if you optimize). Oddly, performance
seems to decrease as you up max buffered docs far before you are even
close to running out of available ram. I do not know why this is, but
you should certainly test to see what your prime settings are.
Also, the knew benchmarking stuff is awesome.
- Mark
Grant Ingersoll wrote:
You may find contrib/Benchmark useful in your testing. Doron Cohen
has added a nice framework for scripting benchmarking tests.
-Grant
On Feb 11, 2007, at 12:14 PM, Mark Miller wrote:
Not sensible at all. First, a merge above something like 90 most
likely never makes since. Second, I have done some testing and my
results show that if you optimize the index after loading, the merge
factor really doesn't matter so keep it at 10 (I never used a max
merge docs below 50. 100 worked best, 1,000 and 2,000 slowed things
down even though the test had access to 600MB RAM and the docs where
around 10-20k each). Setting up a test harness that automatically
indexes a good amount of docs (I did 20,000) with a variety of
settings will tell you a lot. Things will obviously bend based on
your setup.
- Mark
maureen tanuwidjaja wrote:
Hi all,
I just wondering wheter is it sensible and possible if I have
660,000 documents to be indexed,I set the merge factor to 660,000
instead of the default value 10 (...and this means no merge while
indexing) and later after closing the index,I use the IndexWriter
to optimize/merge the whole index file...
Thanks and Regards,
Maureen
---------------------------------
We won't tell. Get more on shows you hate to love
(and love to hate): Yahoo! TV's Guilty Pleasures list.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at
http://wiki.apache.org/jakarta-lucene/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]