On Thu, 2011-06-02 at 21:51 +0200, Clint Gilbert wrote: > We're also considering a home-grown scheme involving normalizing the > denominators of all the index components in all our indices, based on > the sums of counts obtained from all the indices. This feels like > re-inventing the wheel, and it's not clear to me yet that the low-level > manipulation of indices that we'd need to do is even possible.
We're currently experimenting with this approach, albeit only for two searchers. Since we have very little control of the secondary searcher, just a basic search-API, we're really hacking and performing a query rewrite based on term statistics. This only works for basic term queries (no wildcards, ranges etc.), but fortunately our search logs show that they are by far the most common. The math is not too bad: Extract occurrence counts for the terms, sum them, calculate the difference when sending a request to a specific searcher and set a term boost in the textual query, so that the standard ranking formula in Lucene will yield the desired score. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org