Re: relevance function for scores

Erick Erickson Mon, 18 May 2009 06:50:42 -0700

In that case, I'll have to defer to folks who actually know somethingabout
that part of the code <G>.


Erick

On Mon, May 18, 2009 at 9:25 AM, Joel Halbert <j...@su3analytics.com> wrote:

> Hi Erick,
>
> Thanks for the pointer. Sorry if the question was a bit unclear but
> basically I'm looking to see if anyone has any pointers on the actual
> mathematical functions or models to use (rather than the
> implementation). I'd be really interested to hear what others have used
> to solve this - since ideally I'd like a cutoff point optimised to the
> resultant score values.
>
> J
>
> -----Original Message-----
> From: Erick Erickson <erickerick...@gmail.com>
> Reply-To: java-user@lucene.apache.org
> To: java-user@lucene.apache.org
> Subject: Re: relevance function for scores
> Date: Mon, 18 May 2009 09:13:27 -0400
>
> Have you looked at TopDocCollector? Basically, you can tell itto only
> return
> you the top N docs by score (N is arbitrary).
> What you then have is an array of raw score and doc ID pairs
> AND a max score.
>
> NOTE: "raw score" is not normalized, i.e. is not guaranteed to be
> between 0 and 1.
>
> So now you can examine the scores and put them in buckets any
> way you want, all you're doing is spinning through a small data
> structure performing some calculations.....
>
> HTH
> Erick
>
> On Mon, May 18, 2009 at 8:52 AM, Joel Halbert <j...@su3analytics.com>
> wrote:
>
> > Hi,
> >
> > I'd like to apply a score filter. I realise that filtering by absolute
> > (i.e. anything less than x) scores is pretty meaningless.
> >
> > In my case I want to filter based on relative score - or on some
> > function of score which looks for clustering of documents around certain
> > score values.
> >
> > Context: I have set up field boosts such that a query hit on one indexed
> > field will, in theory, result in a score one or more order of magnitudes
> > greater than a hit on some other field. So if I have 2 fields A and B
> > and I'm really really interested in hits on A, and only interested in
> > hits on B if there were none on A,  I boost A by 1000, relative to B.
> > The resultant score should reflect this.
> >
> > The ability to do this becomes important when we want to re-order the
> > search results around some other field (not score) and are not
> > interested in displaying the least relevant documents.
> >
> >
> > It is an easy thing to write a basic 'document collector/result filter'
> > that uses relative score information to filter out documents where any
> > score is less than some magnitude of the best score, but I'm sure this
> > could be more elegantly generalised into some mathematical
> > "relevance/significance" model/function  which could determine some
> > optimal cutoff for documents based on the clustering of results around
> > scores.
> > e.g. if my top 5 documents are all between score 0.9 and 0.7 and the
> > remaining 10 are less than 0.01 then we could sensibly take the top 5
> > docs as most relevant.
> >
> > Has anyone experience of doing such a thing?
> >
> >
> > Regards,
> > Joel
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: relevance function for scores

Reply via email to