[ 
https://issues.apache.org/jira/browse/HIVE-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9738:
--------------------------------------
    Status: Patch Available  (was: In Progress)

> create SOUNDEX udf
> ------------------
>
>                 Key: HIVE-9738
>                 URL: https://issues.apache.org/jira/browse/HIVE-9738
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>            Reporter: Alexander Pivovarov
>            Assignee: Alexander Pivovarov
>         Attachments: HIVE-9738.1.patch
>
>
> Soundex is an encoding used to relate similar names, but can also be used as 
> a general purpose scheme to find word with similar phonemes.
> The American Soundex System
> The soundex code consist of the first letter of the name followed by three 
> digits. These three digits are determined by dropping the letters a, e, i, o, 
> u, h, w and y and adding three digits from the remaining letters of the name 
> according to the table below. There are only two additional rules. (1) If two 
> or more consecutive letters have the same code, they are coded as one letter. 
> (2) If there are an insufficient numbers of letters to make the three digits, 
> the remaining digits are set to zero.
> Soundex Table
>  1 b,f,p,v
>  2 c,g,j,k,q,s,x,z
>  3 d, t
>  4 l
>  5 m, n
>  6 r
> Examples:
> Miller M460
> Peterson P362
> Peters P362
> Auerbach A612
> Uhrbach U612
> Moskowitz M232
> Moskovitz M213
> Implementation:
> http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/Soundex.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to