Andrzej Bialecki wrote:
It's nice to have these couple percent... however, it doesn't solve the
main problem; I need 50 or more percent increase... :-) and I suspect
this can be achieved only by some radical changes in the way Nutch uses
Lucene. It seems the default query structure is too complex to get a
decent performance.
That would certainly help.
For what it's worth, the Internet Archive has ~10M page Nutch indexes
that perform adequately. See:
http://websearch.archive.org/katrina/
The performance is about what you report, but it is quite usable.
(Please don't stress-test this server!) We recently built a ~100M page
Nutch index at the Internet Archive that is surprisingly usable on a
single CPU. (This is not yet publicly accessible.)
Perhaps your traffic will be much higher than the Internet Archive's, or
you have contractual obligations that specify certain average query
performance, but, if not, ~10M pages is quite searchable using Nutch on
a single CPU.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]