I had a situation where I was only interested in whether the term was there or not (not how many times), and I didn't want to penalize long fields. So I wrote a Similariy subclass where I overrided the following methods as this:
public float lengthNorm(String fieldName, int numTerms) { return numTerms > 0 ? 1.0f : 0.0f; } public float tf(float freq) { return freq > 0 ? 1.0f : 0.0f; } And then I made this subclass the default similarity. It worked well for tf but not for lengthNorm. The reason appears to be that the TermScorer class does not call lengthNorm, but instead uses a cache implemented as an static array in Similarity, made available through static methods in Similarity. Since TermScorer calls these static methods in Similarity, changing the default similarity has no effect in this regard. So I ended up having to customize the code of core lucene by changing the following code in Similarity: static { for (int i = 0; i < 256; i++) NORM_TABLE[i] = 1.0f; //Originally: NORM_TABLE[i] = SmallFloat.byte315ToFloat((byte)i); } This worked well, but I had hoped not having to change core lucene, so if anyone has any other/better solution, I would appreciate some tips. MHH
: I am doing some documentation on scoring and I am interested in use : cases people have for overriding the DefaultSimilarity. If you can : share what you did and why you did it, it would be much appreciated. I touched on this a little bit when i commited SweetSpotSimilarity... http://www.nabble.com/Re%3A-SweetSpotSimiliarity-p4536312.html ...really any situation where you know more about your data then just that it's "text" is a situation where it *might* make sense to to override your SImilarity method. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]