Re: Clustering Lucene with 40 Servers

2007-01-02 Thread Peter W.
Hello, Don't have any of the scalability requirements mentioned in this thread but the problem is an interesting one. Lucene needs a connection pool equivalent IMHO or a best practices method for load balancing. Opening, locking, reading and writing to remote indexes over RMI seems good o

Customize scoring for additive effect...

2007-01-02 Thread escher2k
I am trying to build a scoring function which is additive across multiple fields that are searched. For instance, if a user searches for "Web PHP", I want the search to happen over fld1, fld2 and then compute the score as, score = similarity score(fld1) + similarity score(fld2) + I think I ha

Speed of grouped queries

2007-01-02 Thread sdeck
Thanks for advanced on any insight on this one. I have a fairly large query to run, and it takes roughly 20-40 seconds to complete the way that i have it. here is the best example I can give. I have a set of roughly 25K documents indexed I have queries that get documents matching a particular a

Re: IOException - The handle is invalid

2007-01-02 Thread Michael McCandless
Antony Bowesman wrote: Hi Mike, I saw Mike McCandless JIRA issue http://issues.apache.org/jira/browse/LUCENE-669 Is the patch referenced there useful for a 2.0 system. I would like to use the lockless commit stuff, but am waiting until I get the core system working well. I am also gettin

Re: IOException - The handle is invalid

2007-01-02 Thread Antony Bowesman
Hi Mike, I saw Mike McCandless JIRA issue http://issues.apache.org/jira/browse/LUCENE-669 Is the patch referenced there useful for a 2.0 system. I would like to use the lockless commit stuff, but am waiting until I get the core system working well. I am also getting IOException in some of

Re: Clustering Lucene with 40 Servers

2007-01-02 Thread Otis Gospodnetic
Yes, if you think about how blogs and Technorati work (new blog post -> ping -> ping server -> technorati -> index ==> searchable blog post), Adam is correct. Since Doug's implementation we (Technorati) have changed our clusters a LOT (think major rewrites). I can't talk about the details, obv