This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git
commit 001263f58a5275e188bd57be68f76cb271cd7992 Author: Joe McDonnell <[email protected]> AuthorDate: Sun Oct 26 13:39:28 2025 -0700 IMPALA-14514: Handle serializing bytes in bin/run-workload.py On python 3, when Impyla receives a result with a string that is not valid UTF-8, it returns that as bytes. TPC-DS Q30 on scale 20 has a result that contains invalid UTF-8, so bin/run-workload.py can fail while trying to dump this to JSON. This modifies CustomJSONEncoder to handle serializing bytes by converting it to a string with invalid unicode handled with backslashes. Testing: - Ran bin/run-workload.py against TPC-DS scale 20 Change-Id: Ibe31c656de4fc65f8580c7b3b49bf655b8a5ecea Reviewed-on: http://gerrit.cloudera.org:8080/23602 Reviewed-by: Riza Suminto <[email protected]> Reviewed-by: Jason Fehr <[email protected]> Tested-by: Joe McDonnell <[email protected]> --- bin/run-workload.py | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/bin/run-workload.py b/bin/run-workload.py index 78d118ee2..b99b2d41a 100755 --- a/bin/run-workload.py +++ b/bin/run-workload.py @@ -145,6 +145,11 @@ class CustomJSONEncoder(json.JSONEncoder): if isinstance(obj, datetime): # Convert datetime into an standard iso string return obj.isoformat() + if isinstance(obj, bytes): + # Impyla can leave a string value as bytes when it is unable to decode it to UTF-8. + # TPC-DS has queries that produce non-UTF-8 results (e.g. Q30 on scale 20) + # Convert bytes to strings to make JSON encoding work + return obj.decode(encoding="utf-8", errors="backslashreplace") elif isinstance(obj, (Query, HiveQueryResult, QueryExecConfig, TableFormatInfo)): # Serialize these objects manually by returning their __dict__ methods. return obj.__dict__
