New submission from Hirokazu Yamamoto <ocean-c...@m2.ccsnet.ne.jp>: Hello. I noticed test suite reports WARNING every time.
/////////////////////////////////////////////////// E:\python-dev>py3k -m test.regrtest test_os WARNING: The filename '@test_464_tmp-共有される' CAN be encoded by the filesyste m encoding (mbcs). Unicode filename tests may not be effective (snip) /////////////////////////////////////////////////// This happens because TESTFN_UNICODE_UNDECODABLE in Lib/test/support.py *is* decodable on Japanese environment (cp932). It is easy to make this really undecodable in Japanese. Using the characters like "\u2661" or "\u2668" (Former is heart mark, latter is "Onsen" - Hot spring mark) I could remove the warning by this. TESTFN_UNENCODABLE = TESTFN + "-\u5171\u6709\u3055\u308c\u308b\u2661\u2668" /////////////////////////////////////////////////// And another issue. This happens only on test_unicode_file, /////////////////////////////////////////////////// E:\python-dev>py3k -m test.test_unicode_file Traceback (most recent call last): File "e:\python-dev\py3k\lib\test\test_unicode_file.py", line 12, in <module> TESTFN_UNICODE.encode(TESTFN_ENCODING) UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: inval id character During handling of the above exception, another exception occurred: Traceback (most recent call last): File "e:\python-dev\py3k\lib\runpy.py", line 160, in _run_module_as_main "__main__", fname, loader, pkg_name) File "e:\python-dev\py3k\lib\runpy.py", line 73, in _run_code exec(code, run_globals) File "e:\python-dev\py3k\lib\test\test_unicode_file.py", line 16, in <module> raise unittest.SkipTest("No Unicode filesystem semantics on this platform.") unittest.case.SkipTest: No Unicode filesystem semantics on this platform. /////////////////////////////////////////////////// This happens because TESTFN_UNICODE cannot be encoded in Japanese. E:\python-dev>py3k Python 3.2a2+ (py3k:84663M, Sep 10 2010, 13:24:41) [MSC v.1400 32 bit (Intel)] o n win32 Type "help", "copyright", "credits" or "license" for more information. >>> print("-\xe0\xf2") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'cp932' codec can't encode character '\xe0' in position 1: i llegal multibyte sequence But interesting, this bytes sequence "\xe0\xf2" can be read as cp932 multibyte characters. E:\python-dev>python Python 2.6.6 (r266:84297, Aug 24 2010, 18:46:32) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> print "\xe0\xf2" 瑣 >>> "\xe0\xf2".decode("cp932") u'\u7463' E:\python-dev>py3k Python 3.2a2+ (py3k:84663M, Sep 10 2010, 13:24:41) [MSC v.1400 32 bit (Intel)] o n win32 Type "help", "copyright", "credits" or "license" for more information. >>> print('\u7463') 瑣 I believe this value "\xe0\xf2" came from python2.x, maybe "\u7463" should be used here? I'm not sure it can be decoded everywhere using other encodings, though. ---------- components: Tests, Unicode messages: 115989 nosy: ocean-city priority: normal severity: normal status: open title: TESTFN_UNICODE and TESTFN_UNDECODABLE versions: Python 3.1, Python 3.2 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9819> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com