Re: Reducing number of poor results from large BooleanQueries

2005-09-09 Thread markharw00d
Isn't the trouble with introducing a scoring threshold based on raw scores that the Similarity scoring mechanism is considering each document in isolation? At this stage we don't know if the query is generally a good one or not (ie spelt correctly, and not a Googlewhack combination of rarely co

Re: Reducing number of poor results from large BooleanQueries

2005-09-09 Thread Chris Hostetter
: Here is an approach which works based on the quantity : of matching terms in an adapted BooleanQuery: : : http://issues.apache.org/bugzilla/show_bug.cgi?id=35284 Doh! ... I should really start paying attention to the stuff in SVN, I didn't even know there was a DisjunctionSumScorer -- this is e

Re: Reducing number of poor results from large BooleanQueries

2005-09-09 Thread mark harwood
Hi Chris, Here is an approach which works based on the quantity of matching terms in an adapted BooleanQuery: http://issues.apache.org/bugzilla/show_bug.cgi?id=35284 Paul makes an interesting obversation at the end which shows how this functionality can be added to the existing BooleanQuery witho

Reducing number of poor results from large BooleanQueries

2005-09-08 Thread Chris Hostetter
One of the things I'm currently looking into is different ways to approach the more general problem of "filtering by score" in the specific case of BoolenQueries that have a large number of optional terms. Below is a description of the problem I'm considering (with two examples of how it can arri