Antoine Pitrou added the comment: With the system Python on s10:
Python 2.6.8 (unknown, Apr 13 2012, 17:08:12) [C] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.strxfrm('a') 'a' >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') 'en_US.UTF-8' >>> locale.strxfrm('a') '\x01\x01\x01\x0e\x01\x01\x01\x01\x01\x01\x01\x02\x01\x01\x0fi\x01\x01\x01\x01' >>> locale.strxfrm('a').decode('utf-8') u'\x01\x01\x01\x0e\x01\x01\x01\x01\x01\x01\x01\x02\x01\x01\x0fi\x01\x01\x01\x01' The difference between Python 2 and Python 3 is that Python 3 uses wcsxfrm, not strxfrm. Apparently Solaris' wcsxfrm is some broken thing that returns the same thing as strxfrm, cast to a wchar_t *, hence the character U+101010e (corresponding to the '\x01\x01\x01\x0e' bytestring above). ---------- nosy: +loewis, pitrou _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue16258> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com