[
https://issues.apache.org/jira/browse/IMPALA-12927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17949770#comment-17949770
]
ASF subversion and git services commented on IMPALA-12927:
----------------------------------------------------------
Commit 4fea5260cf77bbbfa23457ff15907f5aef55266e in impala's branch
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4fea5260c ]
IMPALA-14030: Fix buffer underflow when base64 decoding 0 length binaries
The issue didn't cause problems under normal circumstances but ASAN
tests caught it in JSON tests enabled in IMPALA-12927.
Changed text parsing logic to skip base64 decoding for empty binaries.
Also fixed Base64DecodeBufLen() with len=0 and added unit tests, though
this function is not used with len=0 outside BE tests.
Change-Id: I511cff8cec319d03d494a342f2cbb4a251cb893e
Reviewed-on: http://gerrit.cloudera.org:8080/22855
Reviewed-by: Riza Suminto <[email protected]>
Reviewed-by: Zoltan Borok-Nagy <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Support reading BINARY columns in JSON tables
> ---------------------------------------------
>
> Key: IMPALA-12927
> URL: https://issues.apache.org/jira/browse/IMPALA-12927
> Project: IMPALA
> Issue Type: Sub-task
> Components: Backend
> Affects Versions: Impala 4.3.0
> Reporter: Csaba Ringhofer
> Assignee: Zihao Ye
> Priority: Major
>
> Currently Impala cannot read BINARY columns in JSON files written by Hive
> correctly and returns runtime errors:
> {code}
> select * from functional_json.binary_tbl;
> +----+--------------+------------+
> | id | string_col | binary_col |
> +----+--------------+------------+
> | 1 | ascii | NULL |
> | 2 | ascii | NULL |
> | 3 | null | NULL |
> | 4 | empty | |
> | 5 | valid utf8 | NULL |
> | 6 | valid utf8 | NULL |
> | 7 | invalid utf8 | NULL |
> | 8 | invalid utf8 | NULL |
> +----+--------------+------------+
> WARNINGS: Error converting column: functional_json.binary_tbl.binary_col,
> type: STRING, data: 'binary1'
> Error parsing row: file:
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0, before
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING,
> data: 'binary2'
> Error parsing row: file:
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0, before
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING,
> data: 'árvíztűrőtükörfúró'
> Error parsing row: file:
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0, before
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING,
> data: '你好hello'
> Error parsing row: file:
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0, before
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING,
> data: '��'
> Error parsing row: file:
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0, before
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING,
> data: '�D3"'
> Error parsing row: file:
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0, before
> offset: 481
> {code}
> The single file in the table looks like this:
> {code}
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0
> {"id":1,"string_col":"ascii","binary_col":"binary1"}
> {"id":2,"string_col":"ascii","binary_col":"binary2"}
> {"id":3,"string_col":"null","binary_col":null}
> {"id":4,"string_col":"empty","binary_col":""}
> {"id":5,"string_col":"valid utf8","binary_col":"árvíztűrőtükörfúró"}
> {"id":6,"string_col":"valid utf8","binary_col":"你好hello"}
> {"id":7,"string_col":"invalid utf8","binary_col":"\u0000�\u0000�"}
> {"id":8,"string_col":"invalid utf8","binary_col":"�D3\"\u0011\u0000"}
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]