Re: Top Score Collector

2007-04-23 Thread jafarim
I am trying to fetch similar results to a Document in the index. The problem are myriad of irrelevant hits the score of which is less than 1 percent. I was thinking to write this class in order to omit these results. I can't use TopDoc because the number of *really* similar results can be known a

Re: Top Score Collector

2007-04-22 Thread Erick Erickson
As to point <2>, the only way I was able to deal with this was by using a TopDocs, which does have a max score. But in that case, I don't believe you can limit the number of hits examined. I've just got to ask... Why do you (jafarim) want to fiddle with the threshold? How is this going to benefi

Re: Top Score Collector

2007-04-22 Thread jafarim
Be aware that score thresholds don't work well in general since scores aren't really comparable from one query to another. What is I normalize the scores in such a manner that they become between 0 and 1? --jaf

Re: Top Score Collector

2007-04-22 Thread Yonik Seeley
On 4/22/07, jafarim <[EMAIL PROTECTED]> wrote: I am trying to implement some TopScoreHitCollector class; a kind of TopDocCollector which collects the documents the score of which is higher than a threshold. The threshold will be configurable in the constructor of the class. There is seemingly a d

Re: Top Score Collector

2007-04-22 Thread Yonik Seeley
On 4/22/07, jafarim <[EMAIL PROTECTED]> wrote: > Be aware that > score thresholds don't work well in general since scores aren't really > comparable from one query to another. What is I normalize the scores in such a manner that they become between 0 and 1? Two issues with that: 1) You never

Top Score Collector

2007-04-22 Thread jafarim
Hi list. I am trying to implement some TopScoreHitCollector class; a kind of TopDocCollector which collects the documents the score of which is higher than a threshold. The threshold will be configurable in the constructor of the class. There is seemingly a document starvation about TopDocCollecto