Shingo Sasaki created LUCENE-5005:
-------------------------------------
Summary: Length norm value of DefaultSimilarity for a few terms
Key: LUCENE-5005
URL: https://issues.apache.org/jira/browse/LUCENE-5005
Project: Lucene - Core
Issue Type: Improvement
Components: core/search
Affects Versions: 4.0
Reporter: Shingo Sasaki
Priority: Minor
lengthNorm method of DefaultSimilarity is following:
{noformat}
public float lengthNorm(FieldInvertState state) {
final int numTerms;
if (discountOverlaps)
numTerms = state.getLength() - state.getNumOverlap();
else
numTerms = state.getLength();
return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
}
{noformat}
The retrun value is decided by (1.0 / Math.sqrt(numTerms)).
The type is float, but this value is encoded to byte length by
SmallFloat.floatToByte315.
||term count||1/sqrt(numTerms)||1/sqrt(numTerms) to byte||
|1| 1.000000| 1.0000|
|2| 0.707107| 0.6250|
|3| 0.577350| 0.5000|
|4| 0.500000| 0.5000|
|5| 0.447214| 0.4375|
The length norm of 3 terms is the same as that of 4 terms.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]