FuzzyQuery minimumSimilarity

Damian Birchler Mon, 05 Nov 2012 08:01:21 -0800

Hi there

Lucene calucaltes the string similarity between two strings s1 and s2 according 
to the formula


Similarity = Levenshtein-Distance(s1,s2)/min(Length(s1),Length(s2))

I would have thought Lucene would divide by the length of the longer string. In 
particular, the above formula could - in my understanding - lead to a negative 
similarity, since the Levenshtein distance can be as long as the length of the 
longer string.

Why does Lucene calculate the similarity in this way?

Cheers,
Damian

smime.p7s
Description: S/MIME cryptographic signature

FuzzyQuery minimumSimilarity

Reply via email to