Hi there

Lucene calucaltes the string similarity between two strings s1 and s2 according 
to the formula

Similarity = Levenshtein-Distance(s1,s2)/min(Length(s1),Length(s2))

I would have thought Lucene would divide by the length of the longer string. In 
particular, the above formula could - in my understanding - lead to a negative 
similarity, since the Levenshtein distance can be as long as the length of the 
longer string.

Why does Lucene calculate the similarity in this way?

Cheers,
Damian

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to