In that case, I'll have to defer to folks who actually know somethingabout that part of the code <G>.
Erick On Mon, May 18, 2009 at 9:25 AM, Joel Halbert <j...@su3analytics.com> wrote: > Hi Erick, > > Thanks for the pointer. Sorry if the question was a bit unclear but > basically I'm looking to see if anyone has any pointers on the actual > mathematical functions or models to use (rather than the > implementation). I'd be really interested to hear what others have used > to solve this - since ideally I'd like a cutoff point optimised to the > resultant score values. > > J > > -----Original Message----- > From: Erick Erickson <erickerick...@gmail.com> > Reply-To: java-user@lucene.apache.org > To: java-user@lucene.apache.org > Subject: Re: relevance function for scores > Date: Mon, 18 May 2009 09:13:27 -0400 > > Have you looked at TopDocCollector? Basically, you can tell itto only > return > you the top N docs by score (N is arbitrary). > What you then have is an array of raw score and doc ID pairs > AND a max score. > > NOTE: "raw score" is not normalized, i.e. is not guaranteed to be > between 0 and 1. > > So now you can examine the scores and put them in buckets any > way you want, all you're doing is spinning through a small data > structure performing some calculations..... > > HTH > Erick > > On Mon, May 18, 2009 at 8:52 AM, Joel Halbert <j...@su3analytics.com> > wrote: > > > Hi, > > > > I'd like to apply a score filter. I realise that filtering by absolute > > (i.e. anything less than x) scores is pretty meaningless. > > > > In my case I want to filter based on relative score - or on some > > function of score which looks for clustering of documents around certain > > score values. > > > > Context: I have set up field boosts such that a query hit on one indexed > > field will, in theory, result in a score one or more order of magnitudes > > greater than a hit on some other field. So if I have 2 fields A and B > > and I'm really really interested in hits on A, and only interested in > > hits on B if there were none on A, I boost A by 1000, relative to B. > > The resultant score should reflect this. > > > > The ability to do this becomes important when we want to re-order the > > search results around some other field (not score) and are not > > interested in displaying the least relevant documents. > > > > > > It is an easy thing to write a basic 'document collector/result filter' > > that uses relative score information to filter out documents where any > > score is less than some magnitude of the best score, but I'm sure this > > could be more elegantly generalised into some mathematical > > "relevance/significance" model/function which could determine some > > optimal cutoff for documents based on the clustering of results around > > scores. > > e.g. if my top 5 documents are all between score 0.9 and 0.7 and the > > remaining 10 are less than 0.01 then we could sensibly take the top 5 > > docs as most relevant. > > > > Has anyone experience of doing such a thing? > > > > > > Regards, > > Joel > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >