[jira] [Resolved] (IMPALA-11978) Implement Unicode sandwich for python code

Joe McDonnell (Jira) Thu, 16 Oct 2025 14:22:00 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-11978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joe McDonnell resolved IMPALA-11978.
------------------------------------
    Fix Version/s: Not Applicable
       Resolution: Duplicate

This was fixed by IMPALA-14333 and Python tests are now running with Python3.

> Implement Unicode sandwich for python code
> ------------------------------------------
>
>                 Key: IMPALA-11978
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11978
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Infrastructure
>    Affects Versions: Impala 4.3.0
>            Reporter: Joe McDonnell
>            Priority: Major
>             Fix For: Not Applicable
>
>
> Python 3 makes a clear distinction between bytes and strings (Unicode). To 
> handle this appropriately, various places need to be clear about whether they 
> are working on Unicode strings or bytes.
> The typical way to fix this for text is to implement a "Unicode sandwich" 
> where the input path is converted to Unicode as early as possible and the 
> output path is converted to bytes as late as possible. This leaves all 
> internal code working on Unicode strings.
> Some parts of our code deal with bytes directly (e.g. 
> tests/util/get_parquet_metadata.py has code that deals with the bytes of a 
> Parquet file). Almost everything else should be dealing with Unicode strings.
> This is also a good time to fix warnings about the unicode() builtin and 
> basestring.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (IMPALA-11978) Implement Unicode sandwich for python code

Reply via email to