[ https://issues.apache.org/jira/browse/HIVE-9556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Pivovarov updated HIVE-9556: -------------------------------------- Description: Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965. Example: The Levenshtein distance between "kitten" and "sitting" is 3 1. kitten → sitten (substitution of "s" for "k") 2. sitten → sittin (substitution of "i" for "e") 3. sittin → sitting (insertion of "g" at the end). {code} select levenshtein('kitten', 'sitting'); 3 {code} was: algorithm description http://en.wikipedia.org/wiki/Levenshtein_distance {code} --one edit operation, greatest str len = 12 str_sim_levenshtein('Test String1', 'Test String2') = 1 - 1 / 12 = 0.91666667 {code} > create UDF to calculate the Levenshtein distance between two strings > -------------------------------------------------------------------- > > Key: HIVE-9556 > URL: https://issues.apache.org/jira/browse/HIVE-9556 > Project: Hive > Issue Type: Improvement > Components: UDF > Reporter: Alexander Pivovarov > Assignee: Alexander Pivovarov > Attachments: HIVE-9556.1.patch, HIVE-9556.2.patch > > > Levenshtein distance is a string metric for measuring the difference between > two sequences. Informally, the Levenshtein distance between two words is the > minimum number of single-character edits (i.e. insertions, deletions or > substitutions) required to change one word into the other. It is named after > Vladimir Levenshtein, who considered this distance in 1965. > Example: > The Levenshtein distance between "kitten" and "sitting" is 3 > 1. kitten → sitten (substitution of "s" for "k") > 2. sitten → sittin (substitution of "i" for "e") > 3. sittin → sitting (insertion of "g" at the end). > {code} > select levenshtein('kitten', 'sitting'); > 3 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)