[ 
https://issues.apache.org/jira/browse/CALCITE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-5840:
------------------------------------
    Labels: pull-request-available  (was: )

> Incorrect SOUNDEX function semantics in MySQL library
> -----------------------------------------------------
>
>                 Key: CALCITE-5840
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5840
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.34.0
>            Reporter: Ran Tao
>            Assignee: Ran Tao
>            Priority: Major
>              Labels: pull-request-available
>
> There are 2 problems.
> 1. 4-char fixed length & mysql non-fixed length
> current in calcite:
> SELECT SOUNDEX('Quadratically');
>     -> 'Q363'
> mysql
> mysql> SELECT SOUNDEX('Quadratically');
>     -> 'Q36324'
> As docs illustrated below, A standard soundex string is four characters long, 
> but the MySQL SOUNDEX() function {*}returns an arbitrarily long string{*}.
> [https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_soundex]
> [https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_soundex]
> 2. when current calcite mysql soundex deal with multi bytes string such as 
> UTF8, it will cause IllegalArgumentException because we use Apache Commons 
> Codec to get the result.
> however, MySQL is not this behavior. e.g.
> {code:java}
> mysql> select soundex('字节');
> +-------------------+
> | soundex('字节')   |
> +-------------------+
> | 字000             |
> +-------------------+
> 1 row in set (0.00 sec)
> mysql> select soundex('字节字节');
> +-------------------------+
> | soundex('字节字节')     |
> +-------------------------+
> | 字000                   |
> +-------------------------+
> 1 row in set (0.10 sec)
> mysql> select soundex('字节字节字节');
> +-------------------------------+
> | soundex('字节字节字节')       |
> +-------------------------------+
> | 字000                         |
> +-------------------------------+
> 1 row in set (0.03 sec)
> mysql> select soundex('バイト');
> +----------------------+
> | soundex('バイト')    |
> +----------------------+
> | バ000                |
> +----------------------+
> 1 row in set (0.04 sec)
> mysql> select soundex('байт');
> +---------------------+
> | soundex('байт')     |
> +---------------------+
> | б000                |
> +---------------------+
> 1 row in set (0.00 sec) {code}
> the different string parameter above means 'BYTE'.
> I think we need to correct the behavior of this function in MySQL library.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to