[issue23993] Use surrogateescape error handler by default in open() if the LC_CTYPE locale is C at startup

STINNER Victor Sat, 18 Apr 2015 15:30:07 -0700

STINNER Victor added the comment:

> "if you are using the C locale you or the OS are broken anyway, so we'll just 
> pass the bytes through"


Exactly. Even if you use Unicode, the Python 3 str type, you store text as raw 
bytes (in a custom format, as surrogate characters).

> I'm not entirely convinced this won't cause issues, but I suppose it might 
> not cause any more issues that having things break due to the C locale does.

The most obvious issue is the come back of mojibake. Since you manipulate raw 
bytes, it's easy to concatenate two bytes strings encoded to two different 
encodings.
https://unicodebook.readthedocs.org/definitions.html#mojibake

The problem is that the question is not how bad it is use to manipulate text as 
bytes. The problem is that a working application written for Python 2 starts to 
randomly fail (on non-ASCII characters) on Python 3 when the LC_CTYPE locale is 
the POSIX locale ("C"). The first question is: should I keep Python 2 or write 
my application in a language which doesn't force me to understand Unicode?

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue23993>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23993] Use surrogateescape error handler by default in open() if the LC_CTYPE locale is C at startup

Reply via email to