Have you looked at TopDocCollector? Basically, you can tell itto only return you the top N docs by score (N is arbitrary). What you then have is an array of raw score and doc ID pairs AND a max score.
NOTE: "raw score" is not normalized, i.e. is not guaranteed to be between 0 and 1. So now you can examine the scores and put them in buckets any way you want, all you're doing is spinning through a small data structure performing some calculations..... HTH Erick On Mon, May 18, 2009 at 8:52 AM, Joel Halbert <j...@su3analytics.com> wrote: > Hi, > > I'd like to apply a score filter. I realise that filtering by absolute > (i.e. anything less than x) scores is pretty meaningless. > > In my case I want to filter based on relative score - or on some > function of score which looks for clustering of documents around certain > score values. > > Context: I have set up field boosts such that a query hit on one indexed > field will, in theory, result in a score one or more order of magnitudes > greater than a hit on some other field. So if I have 2 fields A and B > and I'm really really interested in hits on A, and only interested in > hits on B if there were none on A, I boost A by 1000, relative to B. > The resultant score should reflect this. > > The ability to do this becomes important when we want to re-order the > search results around some other field (not score) and are not > interested in displaying the least relevant documents. > > > It is an easy thing to write a basic 'document collector/result filter' > that uses relative score information to filter out documents where any > score is less than some magnitude of the best score, but I'm sure this > could be more elegantly generalised into some mathematical > "relevance/significance" model/function which could determine some > optimal cutoff for documents based on the clustering of results around > scores. > e.g. if my top 5 documents are all between score 0.9 and 0.7 and the > remaining 10 are less than 0.01 then we could sensibly take the top 5 > docs as most relevant. > > Has anyone experience of doing such a thing? > > > Regards, > Joel > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >