Github user kinow commented on a diff in the pull request: https://github.com/apache/commons-text/pull/85#discussion_r204208775 --- Diff: src/main/java/org/apache/commons/text/similarity/LevenshteinDistance.java --- @@ -270,12 +270,6 @@ private static int limitedCompare(CharSequence left, CharSequence right, final i final int max = j > Integer.MAX_VALUE - threshold ? n : Math.min( n, j + threshold); - // the stripe may lead off of the table if s and t are of different - // sizes - if (min > max) { --- End diff -- A good catch I think @luciano-medallia . @chtompki here's what I understood from reading the code and debugging a bit. If you have the strings s1="papa" and s2="papakura", and you use the threshold 3, the algorithm won't initiate, and we will fail earlier. That's because the we are saying that any edit distance under the threshold (3) must be ignored, and we would now be checking that the difference between the length of both (4) is greater than the threshold. So as the difference between both is 4, and it is greater than our threshold of 3, there's no need to enter the main algorithm. If we increase the threshold to 4, then we get the distance of 4 as it's ok according to the threshold in our code. The part of the code removed I assumed was a constraint removed by this new check. Also debugged with different values and couldn't find a combination to trigger that if. So I think the change is OK. All tests passed locally too.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org