Hi,

I've tried this "fair" similarity with lucene 2.2 but it does not seems to
work.

I've attached the custom "MyFair" similarity to bith IndexWriter and
IndexSearcher.

Do you have any idea ?

Thanks a lot,

Fabrice


Daniel Naber-5 wrote:
> 
> Hi,
> 
> as some of you may have noticed, Lucene prefers shorter documents over 
> longer ones, i.e. shorter documents get a higher ranking, even if the 
> ratio "matched terms / total terms in document" is the same.
> 
> For example, take these two artificial documents:
> 
> doc1: x 2 3 4 5 6 7 8 9 10
> doc2: x x 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> 
> When searching for "x" doc1 will get a higher ranking, even though "x" 
> makes up 1/10 of the terms in both documents.
> 
> Using this similarity implementation seems to "fix" that:
> 
> class MySim extends DefaultSimilarity {
>  
>   public float lengthNorm(String fieldName, int numTerms) {
>     return (float)(1.0 / numTerms);
>   }
>   
>   public float tf(float freq) {
>     return (float)freq;
>   }
> 
> }
> 
> It's basically just the default implementation with Math.sqrt() removed.
> Is 
> this the correct approach? Are there any problems to expect? I just tested 
> it with the documents cited above.
> 
> The use case is that I want to boost fields, e.g. "body:foo^2 title:blah". 
> This could lead to strange results if title is already preferred just 
> because it's shorter.
> 
> Regards
>  Daniel
> 
> -- 
> http://www.danielnaber.de
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/a-%22fair%22-similarity-tp5806739p14992681.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to