This is an automated email from the ASF dual-hosted git repository.
joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new dfedce44b IMPALA-14157: Fix string representation of binary columns
for Python 3
dfedce44b is described below
commit dfedce44bf9bd1e0ceaa35814328fffca4cb973a
Author: Joe McDonnell <[email protected]>
AuthorDate: Mon Jun 23 21:04:23 2025 -0700
IMPALA-14157: Fix string representation of binary columns for Python 3
When running tests with Python 3, several tests are failing when
comparing the results for binary columns. Python 3 represents
binary columns as bytes. When this gets converted to a string,
it gets wrapped with a b'...', which causes difference from the
expected value (e.g. b'whatever' vs whatever). This adds decoding
logic to instead decode the bytes to a string without the added
differences. This uses 'backslashdecode' to avoid throwing an error
for invalid Unicode.
Testing:
- Ran several tests that use binary results with Python 2 and Python 3
(e.g. query_test/test_udfs.py and query_test/test_scanners.py)
Change-Id: If8b3020826a2f376815016affc7fd4c8634b3cba
Reviewed-on: http://gerrit.cloudera.org:8080/23083
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Riza Suminto <[email protected]>
Reviewed-by: Csaba Ringhofer <[email protected]>
---
tests/common/impala_connection.py | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tests/common/impala_connection.py
b/tests/common/impala_connection.py
index 9aad86e66..a1636bb40 100644
--- a/tests/common/impala_connection.py
+++ b/tests/common/impala_connection.py
@@ -949,6 +949,13 @@ class ImpylaHS2ResultSet(object):
# Beeswax return 'false' or 'true' for boolean column.
# HS2 return 'False' or 'True'.
return str(val).lower()
+ elif col_type == 'BINARY':
+ # With Python 3, binary columns are represented as bytes. The default
string
+ # representation of bytes has an extra b'...' surrounding the actual
value.
+ # To avoid that, this decodes the bytes to a regular string. Since
binary values
+ # could be invalid Unicode, this uses 'backslashreplace' to avoid
throwing an
+ # error.
+ return val.decode(errors='backslashreplace')
else:
return str(val)