STINNER Victor added the comment:

On Windows with Python 2, unencodable characters are replaced with "?". It is 
the default behaviour of WideCharToMultiByte() and so all ANSI functions have 
this behaviour. Python doesn't try to behave differently, it just exposes 
system function as Python functions.

So for example, os.listdir(bytes) returns filename with "?" if some characters 
are not encodable to the ANSI codepage. It's a choice in the design of Windows.

> This critical bug is one of the reasons that non-English speaking
> communities doesn't adopt Python as broadly as it happens in
> English world compared to other technologies (PHP etc.).

I don't understand this point.

PHP doesn't have a Unicode type, I'm quite sure that PHP have exactly the same 
issue. And this issue is only solved in Python 3... except if you explicitly 
uses a bytes filename (for os.listdir/os.walk), but the bytes filename API has 
been deprecated in Python 3.3.

In Python 2, you can use Unicode filenames to workaround this issue. But it 
doesn't work as well as Python 3: on UNIX, you will get a similar issue with 
undecodable filenames (which is the opposite of unencodable filenames).

Read my book for more information: https://github.com/haypo/unicode_book/wiki

--

About listdir_unicode-2.7.patch: Python chose to work as Windows with 
unencodable characters. If you want to change the behaviour, you must change 
*all* calls to the Windows ANSI API (which is not trivial). Anyway, as I wrote, 
the bytes API is deprecated for filenames in Python 3.3. I prefer to not change 
anything in Python 2, because it may break existing applications. For example, 
os.listdir(bytes) doesn't fail in Python 2.7 with unencodable names, whereas it 
fails with your patch.

Nothing interesting in this issue, I'm closing it. If your consider the 
redirection issue important, please open a new issue.

----------
status: open -> closed

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16656>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to