Hi, I've tried this "fair" similarity with lucene 2.2 but it does not seems to work.
I've attached the custom "MyFair" similarity to bith IndexWriter and IndexSearcher. Do you have any idea ? Thanks a lot, Fabrice Daniel Naber-5 wrote: > > Hi, > > as some of you may have noticed, Lucene prefers shorter documents over > longer ones, i.e. shorter documents get a higher ranking, even if the > ratio "matched terms / total terms in document" is the same. > > For example, take these two artificial documents: > > doc1: x 2 3 4 5 6 7 8 9 10 > doc2: x x 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 > > When searching for "x" doc1 will get a higher ranking, even though "x" > makes up 1/10 of the terms in both documents. > > Using this similarity implementation seems to "fix" that: > > class MySim extends DefaultSimilarity { > > public float lengthNorm(String fieldName, int numTerms) { > return (float)(1.0 / numTerms); > } > > public float tf(float freq) { > return (float)freq; > } > > } > > It's basically just the default implementation with Math.sqrt() removed. > Is > this the correct approach? Are there any problems to expect? I just tested > it with the documents cited above. > > The use case is that I want to boost fields, e.g. "body:foo^2 title:blah". > This could lead to strange results if title is already preferred just > because it's shorter. > > Regards > Daniel > > -- > http://www.danielnaber.de > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/a-%22fair%22-similarity-tp5806739p14992681.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]