[
https://issues.apache.org/jira/browse/IMPALA-14157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17985906#comment-17985906
]
ASF subversion and git services commented on IMPALA-14157:
----------------------------------------------------------
Commit dfedce44bf9bd1e0ceaa35814328fffca4cb973a in impala's branch
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=dfedce44b ]
IMPALA-14157: Fix string representation of binary columns for Python 3
When running tests with Python 3, several tests are failing when
comparing the results for binary columns. Python 3 represents
binary columns as bytes. When this gets converted to a string,
it gets wrapped with a b'...', which causes difference from the
expected value (e.g. b'whatever' vs whatever). This adds decoding
logic to instead decode the bytes to a string without the added
differences. This uses 'backslashdecode' to avoid throwing an error
for invalid Unicode.
Testing:
- Ran several tests that use binary results with Python 2 and Python 3
(e.g. query_test/test_udfs.py and query_test/test_scanners.py)
Change-Id: If8b3020826a2f376815016affc7fd4c8634b3cba
Reviewed-on: http://gerrit.cloudera.org:8080/23083
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Riza Suminto <[email protected]>
Reviewed-by: Csaba Ringhofer <[email protected]>
> Python 3 displays binary types differently than Python 2
> --------------------------------------------------------
>
> Key: IMPALA-14157
> URL: https://issues.apache.org/jira/browse/IMPALA-14157
> Project: IMPALA
> Issue Type: Sub-task
> Components: Infrastructure
> Affects Versions: Impala 5.0.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Major
>
> Running pytests with Python 3 results in test failures for binary types. e.g.
> {noformat}
> query_test/test_scanners.py:207: in test_partition_columns
> self.run_test_case('QueryTest/iceberg-virtual-partition-columns', vector)
> common/impala_test_suite.py:915: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:745: in __verify_results_and_errors
> verify_raw_results(test_section, result, vector,
> common/test_result_verifier.py:523: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:305: in verify_query_result_is_equal
> assert expected_results == actual_results
> E assert Comparing QueryTestResults (expected vs actual):
> E
> 0,'dHJ1ZQ==.MQ==.MTE=.MS4x.Mi4yMjI=.MTIzLjMyMQ==.MTkwNDU=.aW1wYWxh',1,true,1,11,1.100000023841858,2.222,123.321,2022-02-22,'impala'
> !=
> 0,'b'dHJ1ZQ==.MQ==.MTE=.MS4x.Mi4yMjI=.MTIzLjMyMQ==.MTkwNDU=.aW1wYWxh'',1,true,1,11,1.100000023841858,2.222,123.321,2022-02-22,'impala'
> E
> 0,'dHJ1ZQ==.MQ==.MTE=.MS4x.Mi4yMjI=.MTIzLjMyMQ==.MTkwNDU=.aW1wYWxh',2,true,1,11,1.100000023841858,2.222,123.321,2022-02-22,'impala'
> !=
> 0,'b'dHJ1ZQ==.MQ==.MTE=.MS4x.Mi4yMjI=.MTIzLjMyMQ==.MTkwNDU=.aW1wYWxh'',2,true,1,11,1.100000023841858,2.222,123.321,2022-02-22,'impala'{noformat}
> Python3 sees the columns as bytes and prints them with a b'...' format, which
> doesn't match the expected result. We should decode it into a string
> (probably with handling for invalid unicode) before the comparison.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]