STINNER Victor added the comment:

Sorry, I still didn't have enough time to read carefully the PEP 538. But since 
the discussion already started on this issue, I will add my comments:

* I'm sure that many Linux, UNIX and BSD systems don't have the "C.UTF-8" 
locale. For example, HP-UX has "C.utf8" which is not exactly "C.UTF-8".

* Setting the locale has an impact on all libraries running in the Python 
process. At this point, I'm not sure that it is what we want.

* I'm not sure that it's ok in 2017 to always force the UTF-8 encoding if the 
user locale uses a different encoding. I had the same concern with the PEP 528 
(Change Windows console encoding to UTF-8) and PEP 529 (Change Windows 
filesystem encoding to UTF-8) on Windows, but these PEPs were approved and 
merged into Python 3.6. My fear is obviously mojibake with the other 
applications using the other encoding, the locale encoding. Other applications 
are not impacted by setlocale() in the Python process.

* I proposed an opt-in option to force UTF-8: -X utf8 command line option and 
PYTHONUTF8=1 env var. Opt-in will obviously reduce the risk of backward 
compatibility issues. With an opt-in option, users are better prepared for 
mojibake issues.

* I dislike "Backporting to earlier Python 3 releases". In my experience, 
changes on how Python handles text (encodings, codecs, etc.) always have subtle 
issues, and users dislike getting backward incompatible changes in minor 
releases. *Maybe* if the option is an opt-in, the risk is lower and acceptable?

* I dislike that Fedora has such downstream change. I would prefer to decide 
upstream how to convert UTF-8 slowly as a first-class citizen in Python. 
Otherwise, Fedora would behave differently than other Linux distributions and 
it can be painful to write applications having the same behaviour on all Linux 
distributions. But I also understand that Fedora has sometimes to move faster 
than the slow CPython project :-) Fedora can also seen as a toy to experiment 
changes quickly which helps to provide a wide feedback upstream to take better 
decision.

* Using strict or surrogateescape error handler is a very important choice 
which has a wide impact. If we use utf8 by default (PEP 538), people will 
problably complain less if Python magically pass undecoded bytes thanks to the 
surrogateescape. If the option is an opt-in, strict may make sense. But 
surrogateescape is maybe still more "convenient". I don't know at this point.

Nick: it seems like you have a well defined plan. But I dislike on multiple 
points. I don't know if it's better to try to convince you to change your PEP, 
or write a different PEP.

I planned to write such "UTF-8" PEP since 2015, but I never started because the 
scope is so large that I fear all tiny but annoying corner cases...

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28180>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to