[ https://issues.apache.org/jira/browse/HIVE-24151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HIVE-24151 started by Ádám Szita. ----------------------------------------- > MultiDelimitSerDe shifts data if strings contain non-ASCII characters > --------------------------------------------------------------------- > > Key: HIVE-24151 > URL: https://issues.apache.org/jira/browse/HIVE-24151 > Project: Hive > Issue Type: Bug > Reporter: Ádám Szita > Assignee: Ádám Szita > Priority: Major > > HIVE-22360 intended to fix another MultiDelimitSerde problem (with NULL last > columns) but introduced a regression: the approach of the fix is pretty much > all wrong, as the existing logic that operated on bytes got replaced by regex > matcher logic which deals in character positions, rather than byte positions. > As some non ASCII characters consist of more than 1 byte, the whole record > may get shifted due to this. > With this ticket I'm going to restore the old logic, and apply the proper fix > on that, but keeping (and extending) the test cases added with HIVE-22360 so > that we have a solution for both issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)