[ 
https://issues.apache.org/jira/browse/HIVE-9556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9556:
--------------------------------------
    Description: 
Levenshtein distance is a string metric for measuring the difference between 
two sequences. Informally, the Levenshtein distance between two words is the 
minimum number of single-character edits (i.e. insertions, deletions or 
substitutions) required to change one word into the other. It is named after 
Vladimir Levenshtein, who considered this distance in 1965.

Example:
The Levenshtein distance between "kitten" and "sitting" is 3
1. kitten → sitten (substitution of "s" for "k")
2. sitten → sittin (substitution of "i" for "e")
3. sittin → sitting (insertion of "g" at the end).
{code}
select levenshtein('kitten', 'sitting');
3
{code}

  was:
algorithm description http://en.wikipedia.org/wiki/Levenshtein_distance
{code}
--one edit operation, greatest str len = 12
str_sim_levenshtein('Test String1', 'Test String2') = 1 - 1 / 12 = 0.91666667
{code}


> create UDF to calculate the Levenshtein distance between two strings
> --------------------------------------------------------------------
>
>                 Key: HIVE-9556
>                 URL: https://issues.apache.org/jira/browse/HIVE-9556
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>            Reporter: Alexander Pivovarov
>            Assignee: Alexander Pivovarov
>         Attachments: HIVE-9556.1.patch, HIVE-9556.2.patch
>
>
> Levenshtein distance is a string metric for measuring the difference between 
> two sequences. Informally, the Levenshtein distance between two words is the 
> minimum number of single-character edits (i.e. insertions, deletions or 
> substitutions) required to change one word into the other. It is named after 
> Vladimir Levenshtein, who considered this distance in 1965.
> Example:
> The Levenshtein distance between "kitten" and "sitting" is 3
> 1. kitten → sitten (substitution of "s" for "k")
> 2. sitten → sittin (substitution of "i" for "e")
> 3. sittin → sitting (insertion of "g" at the end).
> {code}
> select levenshtein('kitten', 'sitting');
> 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to