[
https://issues.apache.org/jira/browse/IMPALA-11978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joe McDonnell resolved IMPALA-11978.
------------------------------------
Fix Version/s: Not Applicable
Resolution: Duplicate
This was fixed by IMPALA-14333 and Python tests are now running with Python3.
> Implement Unicode sandwich for python code
> ------------------------------------------
>
> Key: IMPALA-11978
> URL: https://issues.apache.org/jira/browse/IMPALA-11978
> Project: IMPALA
> Issue Type: Sub-task
> Components: Infrastructure
> Affects Versions: Impala 4.3.0
> Reporter: Joe McDonnell
> Priority: Major
> Fix For: Not Applicable
>
>
> Python 3 makes a clear distinction between bytes and strings (Unicode). To
> handle this appropriately, various places need to be clear about whether they
> are working on Unicode strings or bytes.
> The typical way to fix this for text is to implement a "Unicode sandwich"
> where the input path is converted to Unicode as early as possible and the
> output path is converted to bytes as late as possible. This leaves all
> internal code working on Unicode strings.
> Some parts of our code deal with bytes directly (e.g.
> tests/util/get_parquet_metadata.py has code that deals with the bytes of a
> Parquet file). Almost everything else should be dealing with Unicode strings.
> This is also a good time to fix warnings about the unicode() builtin and
> basestring.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]