Hi, as some of you may have noticed, Lucene prefers shorter documents over longer ones, i.e. shorter documents get a higher ranking, even if the ratio "matched terms / total terms in document" is the same.
For example, take these two artificial documents: doc1: x 2 3 4 5 6 7 8 9 10 doc2: x x 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 When searching for "x" doc1 will get a higher ranking, even though "x" makes up 1/10 of the terms in both documents. Using this similarity implementation seems to "fix" that: class MySim extends DefaultSimilarity { public float lengthNorm(String fieldName, int numTerms) { return (float)(1.0 / numTerms); } public float tf(float freq) { return (float)freq; } } It's basically just the default implementation with Math.sqrt() removed. Is this the correct approach? Are there any problems to expect? I just tested it with the documents cited above. The use case is that I want to boost fields, e.g. "body:foo^2 title:blah". This could lead to strange results if title is already preferred just because it's shorter. Regards Daniel -- http://www.danielnaber.de --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]