Martin Pool <m...@sourcefrog.net> added the comment: On 21 December 2011 12:16, Antoine Pitrou <rep...@bugs.python.org> wrote: > > Antoine Pitrou <pit...@free.fr> added the comment: > > So, you're complaining about something which works, kind of: > > $ touch héhé > $ LANG=C python3 -c "import os; print(os.listdir())" > ['h\udcc3\udca9h\udcc3\udca9']
It's possible to work around this in some cases, such as listdir, by coping with the result including some byte strings, and then manually decoding them. But there are, iirc, other cases where the call just fails and there is no easy workaround. It wasn't impossible to get unicode right in python2, but python3 still thinks it's worth changing things to make it work better. >> This makes robustly working with non-ascii filenames on different >> platforms needlessly annoying, given no modern nix should have problems >> just using UTF-8 in these cases. > > So why don't these supposedly "modern" systems at least set the appropriate > environment variables for Python to infer the proper character encoding? > (since these "modern" systems don't have a well-defined encoding...) The standard encoding is UTF-8. Python shouldn't need to have a variable set to tell it this. Python is making an assumption about the default but it is a bad assumption. > The culprit is not Python, it's the Unix crap.... Programs need to work with the environments that are available to them, even though those environments often have flaws. Windows and Mac have annoying bugs too, even bugs specifically about Unicode. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13643> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com