STINNER Victor <victor.stin...@haypocalc.com> added the comment:

> WARNING: The filename '@test_464_tmp-共有される' CAN be encoded 
> by (...) cp932

We should find character not encodable in any Windows code page, but accepted 
as filenames.

> characters like "\u2661" or "\u2668" (...)

mbcs uses "ANSI" code pages: cp1250..cp1258 and cp874 (and maybe others because 
you wrote that your setup uses cp932):
http://en.wikipedia.org/wiki/Code_page#Windows_.28ANSI.29_code_pages

I wrote a short script to find a unencodable filename (attached to this issue). 
Output:

u'\u0301' is encodable to cp1258
u'\u0363' is not encodable to any code page
u'\u2661' is encodable to cp949
u'\u5171' is encodable to cp932, cp936, cp949, cp950

(CODE_PAGES constant of the script might be incomplete)

u'\u2661' is not a good candidate. u'\u0363' looks better. Be we can mix 
different characters to limit the probability that the whole string is 
encodable. Example:

u'\u2661\u5171' is encodable to cp949
u'\u0301\u0363\u2661\u5171' is not encodable to any code page

> TESTFN_UNICODE_UNDECODEABLE (2.x)

This is a typo fixed by r83987 in py3k.

----------
Added file: http://bugs.python.org/file18823/find_unencode_filename.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9819>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to