Andrzej Bialecki wrote:
It's nice to have these couple percent... however, it doesn't solve the main problem; I need 50 or more percent increase... :-) and I suspect this can be achieved only by some radical changes in the way Nutch uses Lucene. It seems the default query structure is too complex to get a decent performance.

That would certainly help.

For what it's worth, the Internet Archive has ~10M page Nutch indexes that perform adequately. See:

http://websearch.archive.org/katrina/

The performance is about what you report, but it is quite usable. (Please don't stress-test this server!) We recently built a ~100M page Nutch index at the Internet Archive that is surprisingly usable on a single CPU. (This is not yet publicly accessible.)

Perhaps your traffic will be much higher than the Internet Archive's, or you have contractual obligations that specify certain average query performance, but, if not, ~10M pages is quite searchable using Nutch on a single CPU.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to