Hi all,

The past week or so, I noticed failures in the Azure Pipelines CI (see https://github.com/twisted/twisted/pull/1278 for the ticket with them, among others) that were due to Python + Windows falling apart on mgorny's name. After some debugging, I ascertained:

- The environment has Unicode strings in it (because environments are Unicode on Windows) - but sys.stdout.encoding is cp1252 -- https://www.python.org/dev/peps/pep-0528/ does not apply due to it being a non-interactive console - One of the characters in the environment is not printable under cp1252, which causes an exception.

I think we should avoid running under ANSI-mode by default at all costs, since it causes non-obvious bugs like this (`print(os.environ)` causing an exception). This would also bring Windows in line with UNIX, where we basically assume a non-UTF-8 locale is more or less broken by design and we don't run the tests on it.

It also seems like Windows is heading in the direction of having console output be CP65001 (aka UTF-8), so I think this is a reasonable direction to go in as well. [1] [2] [3]

PEP-528 makes sys.stdout/sys.stdin use the W ("wide", aka UTF-16LE) APIs, as it's assumed that a human is on the other side of the console. For compatibility, it will encode Unicode to UTF-8, pass it to WindowsConsoleIO, which will then decode it into UTF-16 and pass it to the console, meaning that writing raw UTF-8 bytes to sys.stdout.buffer works as you'd expect on Windows and UNIXes. We can enable UTF-8 text output universally with the environment variable `PYTHONIOENCODING=utf8:surrogateescape`. If a user wants ANSI output, they can use the "PYTHONLEGACYWINDOWSSTDIO" environment to make Python not perform the Unicode conversions for the console, so we could perhaps use this too, if someone is SURE they want ANSI output.

Python 3.7 has PEP-540's `-X utf8` mode, which also does this, more or less, but in a nicer way (no environment variables).

Python 3.5 doesn't seem to work with either of these options. Not sure why. Maybe it's busted.

So, due to this, I would like to propose the following:

- On Windows, raising a deprecation warning when sys.stdout and sys.stderr are not UTF-8 AND the environment variable "PYTHONLEGACYWINDOWSSTDIO" is not set. - Declaring said environments unsupported and running our tests with -X utf8/PYTHONIOENCODING=utf8 or PYTHONLEGACYWINDOWSSTDIO (which will require some Unicode tests which fail because CP1252 is bad to be skipped). - After the deprecation period, start issuing loud RuntimeWarnings saying that you're probably not doing the thing you want to be doing.

Opinions?

- Amber

[1] https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/ [2] https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding.default?view=netcore-3.1#the-default-property-on-net-core [3] https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to