[ 
https://issues.apache.org/jira/browse/HIVE-9556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9556:
--------------------------------------
    Status: Patch Available  (was: In Progress)

> create UDF to calculate the Levenshtein distance between two strings
> --------------------------------------------------------------------
>
>                 Key: HIVE-9556
>                 URL: https://issues.apache.org/jira/browse/HIVE-9556
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>            Reporter: Alexander Pivovarov
>            Assignee: Alexander Pivovarov
>         Attachments: HIVE-9556.1.patch, HIVE-9556.2.patch, HIVE-9556.3.patch
>
>
> Levenshtein distance is a string metric for measuring the difference between 
> two sequences. Informally, the Levenshtein distance between two words is the 
> minimum number of single-character edits (i.e. insertions, deletions or 
> substitutions) required to change one word into the other. It is named after 
> Vladimir Levenshtein, who considered this distance in 1965.
> Example:
> The Levenshtein distance between "kitten" and "sitting" is 3
> 1. kitten → sitten (substitution of "s" for "k")
> 2. sitten → sittin (substitution of "i" for "e")
> 3. sittin → sitting (insertion of "g" at the end).
> {code}
> select levenshtein('kitten', 'sitting');
> 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to