[ 
https://issues.apache.org/jira/browse/HIVE-24151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24151 started by Ádám Szita.
-----------------------------------------
> MultiDelimitSerDe shifts data if strings contain non-ASCII characters
> ---------------------------------------------------------------------
>
>                 Key: HIVE-24151
>                 URL: https://issues.apache.org/jira/browse/HIVE-24151
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>
> HIVE-22360 intended to fix another MultiDelimitSerde problem (with NULL last 
> columns) but introduced a regression: the approach of the fix is pretty much 
> all wrong, as the existing logic that operated on bytes got replaced by regex 
> matcher logic which deals in character positions, rather than byte positions. 
> As some non ASCII characters consist of more than 1 byte, the whole record 
> may get shifted due to this.
> With this ticket I'm going to restore the old logic, and apply the proper fix 
> on that, but keeping (and extending) the test cases added with HIVE-22360 so 
> that we have a solution for both issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to