STINNER Victor <victor.stin...@haypocalc.com> added the comment: > WARNING: The filename '@test_464_tmp-共有される' CAN be encoded > by (...) cp932
We should find character not encodable in any Windows code page, but accepted as filenames. > characters like "\u2661" or "\u2668" (...) mbcs uses "ANSI" code pages: cp1250..cp1258 and cp874 (and maybe others because you wrote that your setup uses cp932): http://en.wikipedia.org/wiki/Code_page#Windows_.28ANSI.29_code_pages I wrote a short script to find a unencodable filename (attached to this issue). Output: u'\u0301' is encodable to cp1258 u'\u0363' is not encodable to any code page u'\u2661' is encodable to cp949 u'\u5171' is encodable to cp932, cp936, cp949, cp950 (CODE_PAGES constant of the script might be incomplete) u'\u2661' is not a good candidate. u'\u0363' looks better. Be we can mix different characters to limit the probability that the whole string is encodable. Example: u'\u2661\u5171' is encodable to cp949 u'\u0301\u0363\u2661\u5171' is not encodable to any code page > TESTFN_UNICODE_UNDECODEABLE (2.x) This is a typo fixed by r83987 in py3k. ---------- Added file: http://bugs.python.org/file18823/find_unencode_filename.py _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9819> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com