Re: relevance function for scores

2009-05-27 Thread kenny kim
pared to a vanilla search. -Original Message- From: kenny kim Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: relevance function for scores Date: Wed, 27 May 2009 19:18:39 +0900 I seems to be a good solution. However, I think it may takes some processing t

Re: relevance function for scores

2009-05-27 Thread Joel Halbert
-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: relevance function for scores Date: Wed, 27 May 2009 19:18:39 +0900 I seems to be a good solution. However, I think it may takes some processing time to get the distribution of all matching documents before scoring each docs. Would you h

Re: relevance function for scores

2009-05-27 Thread kenny kim
- From: Babak Farhang Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: relevance function for scores Date: Mon, 25 May 2009 16:11:32 -0600 Woops. Got that backwards.. should read if (score[n] / score[n-1]) < c / (boost_factor) On Mon, May 25, 2009 a

Re: relevance function for scores

2009-05-26 Thread Joel Halbert
- From: Babak Farhang Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: relevance function for scores Date: Mon, 25 May 2009 16:11:32 -0600 Woops. Got that backwards.. should read > if (score[n] / score[n-1]) < c / (boost_factor) On Mon, May 25, 2009 at 4

Re: relevance function for scores

2009-05-25 Thread kenny kim
Hi, I think you and I are looking for the same thing. I believe that it can dramatically reduce search time for my heavy indexes. Could you let me know if you find a good solution? Hope, have a good day. On 2009. 05. 18, at 오후 9:52, Joel Halbert wrote: Hi, I'd like to apply a score filter

Re: relevance function for scores

2009-05-25 Thread Babak Farhang
Woops. Got that backwards.. should read > if (score[n] / score[n-1]) < c / (boost_factor) On Mon, May 25, 2009 at 4:10 PM, Babak Farhang wrote: > How about determining the cutoff by measuring the percentage > difference between successive scores: if the score drops by a > threshold amount the

Re: relevance function for scores

2009-05-25 Thread Babak Farhang
How about determining the cutoff by measuring the percentage difference between successive scores: if the score drops by a threshold amount then you've hit the cutoff. In the example you mention, you might want to try something like c/1000, where 1 < c < 25 is a constant (experiment to find a swee

Re: relevance function for scores

2009-05-18 Thread Joel Halbert
a cutoff point optimised to the > resultant score values. > > J > > -Original Message- > From: Erick Erickson > Reply-To: java-user@lucene.apache.org > To: java-user@lucene.apache.org > Subject: Re: relevance function for scores > Date: Mon, 18 May 2009 09:13:27 -0

Re: relevance function for scores

2009-05-18 Thread Erick Erickson
> > J > > -Original Message- > From: Erick Erickson > Reply-To: java-user@lucene.apache.org > To: java-user@lucene.apache.org > Subject: Re: relevance function for scores > Date: Mon, 18 May 2009 09:13:27 -0400 > > Have you looked at TopDocCollector? Basically,

Re: relevance function for scores

2009-05-18 Thread Joel Halbert
solve this - since ideally I'd like a cutoff point optimised to the resultant score values. J -Original Message- From: Erick Erickson Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: relevance function for scores Date: Mon, 18 May 2009 09:13:27 -0400 Hav

Re: relevance function for scores

2009-05-18 Thread Erick Erickson
Have you looked at TopDocCollector? Basically, you can tell itto only return you the top N docs by score (N is arbitrary). What you then have is an array of raw score and doc ID pairs AND a max score. NOTE: "raw score" is not normalized, i.e. is not guaranteed to be between 0 and 1. So now you ca

relevance function for scores

2009-05-18 Thread Joel Halbert
Hi, I'd like to apply a score filter. I realise that filtering by absolute (i.e. anything less than x) scores is pretty meaningless. In my case I want to filter based on relative score - or on some function of score which looks for clustering of documents around certain score values. Context: I