[
https://issues.apache.org/jira/browse/CALCITE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated CALCITE-5840:
------------------------------------
Labels: pull-request-available (was: )
> Incorrect SOUNDEX function semantics in MySQL library
> -----------------------------------------------------
>
> Key: CALCITE-5840
> URL: https://issues.apache.org/jira/browse/CALCITE-5840
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.34.0
> Reporter: Ran Tao
> Assignee: Ran Tao
> Priority: Major
> Labels: pull-request-available
>
> There are 2 problems.
> 1. 4-char fixed length & mysql non-fixed length
> current in calcite:
> SELECT SOUNDEX('Quadratically');
> -> 'Q363'
> mysql
> mysql> SELECT SOUNDEX('Quadratically');
> -> 'Q36324'
> As docs illustrated below, A standard soundex string is four characters long,
> but the MySQL SOUNDEX() function {*}returns an arbitrarily long string{*}.
> [https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_soundex]
> [https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_soundex]
> 2. when current calcite mysql soundex deal with multi bytes string such as
> UTF8, it will cause IllegalArgumentException because we use Apache Commons
> Codec to get the result.
> however, MySQL is not this behavior. e.g.
> {code:java}
> mysql> select soundex('字节');
> +-------------------+
> | soundex('字节') |
> +-------------------+
> | 字000 |
> +-------------------+
> 1 row in set (0.00 sec)
> mysql> select soundex('字节字节');
> +-------------------------+
> | soundex('字节字节') |
> +-------------------------+
> | 字000 |
> +-------------------------+
> 1 row in set (0.10 sec)
> mysql> select soundex('字节字节字节');
> +-------------------------------+
> | soundex('字节字节字节') |
> +-------------------------------+
> | 字000 |
> +-------------------------------+
> 1 row in set (0.03 sec)
> mysql> select soundex('バイト');
> +----------------------+
> | soundex('バイト') |
> +----------------------+
> | バ000 |
> +----------------------+
> 1 row in set (0.04 sec)
> mysql> select soundex('байт');
> +---------------------+
> | soundex('байт') |
> +---------------------+
> | б000 |
> +---------------------+
> 1 row in set (0.00 sec) {code}
> the different string parameter above means 'BYTE'.
> I think we need to correct the behavior of this function in MySQL library.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)