WaterWalk wrote: > Hello. I just found on Windows when an exception is raised and > traceback info is printed on STDERR, all the characters printed are > just plain ASCII. Take the unicode character u'\u4e00' for example. If > I write: > > print u'\u4e00' > > If the system locale is "PRC China", then this statement will print > this character as a single Chinese character. > > But if i write: assert u'\u4e00' == 1 > > An AssertionError will be raised and traceback info will be put to > STDERR, while this time, u'\u4e00' will simply be printed just as > u'\u4e00', several ASCII characters instead of one single Chinese > character. I use the coding directive commen(# -*- coding: utf-8 -*-)t > on the first line of Python source file and also save it in utf-8 > format, but the problem remains. > > What's worse, if i directly write Chinese characters in a unicode > string, when the traceback info is printed, they'll appear in a non- > readable way, that is, they show themselves as something else. It's > like printing something DBCS characters when the locale is incorrect. > > I think this problem isn't unique. When using some other East-Asia > characters, the same problem may recur. > > Is there any workaround to it?
Pass a byte string but make some effort to use the right encoding: >>> assert False, u"\u4e00".encode(sys.stdout.encoding or "ascii", >>> "xmlcharrefreplace") Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError: 一 You might be able to do this in the except hook: $ cat unicode_exception_message.py import sys def eh(etype, exc, tb, original_excepthook=sys.excepthook): message = exc.args[0] if isinstance(message, unicode): exc.args = (message.encode(sys.stderr.encoding or "ascii", "xmlcharrefreplace"),) + exc.args[1:] return original_excepthook(etype, exc, tb) sys.excepthook = eh assert False, u"\u4e00" $ python unicode_exception_message.py Traceback (most recent call last): File "unicode_exception_message.py", line 11, in <module> assert False, u"\u4e00" AssertionError: 一 If python cannot figure out the encoding this falls back to ascii with xml charrefs: $ python unicode_exception_message.py 2>tmp.txt $ cat tmp.txt Traceback (most recent call last): File "unicode_exception_message.py", line 11, in <module> assert False, u"\u4e00" AssertionError: 一 Note that I've not done any tests; e.g. if there are exceptions with immutable .args the except hook itself will fail. Peter -- http://mail.python.org/mailman/listinfo/python-list