Joe McDonnell created IMPALA-14514:
--------------------------------------
Summary: bin-workload.py needs to handle serializing bytes /
invalid UTF-8 to JSON
Key: IMPALA-14514
URL: https://issues.apache.org/jira/browse/IMPALA-14514
Project: IMPALA
Issue Type: Bug
Components: Infrastructure
Affects Versions: Impala 5.0.0
Reporter: Joe McDonnell
On python 3, when Impyla receives a result with a string that is not valid
UTF-8, it returns that as bytes. TPC-DS Q30 on scale 20 has a result that
contain invalid UTF-8, so bin/run-workload.py can fail while trying to dump
this to JSON:
{noformat}
18:49:20 Traceback (most recent call last):
18:49:20 File "/home/ubuntu/Impala/bin/run-workload.py", line 289, in <module>
18:49:20 json.dump(result_map, f, cls=CustomJSONEncoder, ensure_ascii=False)
18:49:20 File
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/__init__.py",
line 179, in dump
18:49:20 for chunk in iterable:
18:49:20 File
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
line 431, in _iterencode
18:49:20 yield from _iterencode_dict(o, _current_indent_level)
18:49:20 File
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
line 405, in _iterencode_dict
18:49:20 yield from chunks
18:49:20 File
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
line 325, in _iterencode_list
18:49:20 yield from chunks
18:49:20 File
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
line 439, in _iterencode
18:49:20 yield from _iterencode(o, _current_indent_level)
18:49:20 File
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
line 431, in _iterencode
18:49:20 yield from _iterencode_dict(o, _current_indent_level)
18:49:20 File
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
line 405, in _iterencode_dict
18:49:20 yield from chunks
18:49:20 File
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
line 325, in _iterencode_list
18:49:20 yield from chunks
18:49:20 File
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
line 325, in _iterencode_list
18:49:20 yield from chunks
18:49:20 File
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
line 438, in _iterencode
18:49:20 o = _default(o)
18:49:20 File "/home/ubuntu/Impala/bin/run-workload.py", line 152, in default
18:49:20 super(CustomJSONEncoder, self).default(obj)
18:49:20 File
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
line 179, in default
18:49:20 raise TypeError(f'Object of type {o.__class__.__name__} '{noformat}
We should change CustomJSONEncoder to handle bytes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]