On Sat, Dec 13, 2008 at 12:28 PM, John Machin <sjmac...@lexicon.net> wrote: > > Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit > (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. >>>> x = u'\u9876' >>>> x > u'\u9876' > > # As expected > > Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit > (Intel)] on win 32 > Type "help", "copyright", "credits" or "license" for more information. >>>> x = '\u9876' >>>> x > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "C:\python30\lib\io.py", line 1491, in write > b = encoder.encode(s) > File "C:\python30\lib\encodings\cp850.py", line 19, in encode > return codecs.charmap_encode(input,self.errors,encoding_map)[0] > UnicodeEncodeError: 'charmap' codec can't encode character '\u9876' in > position > 1: character maps to <undefined> > > # *NOT* as expected (by me, that is) > > Is this the intended outcome?
When Python tries to display the character, it must first encode it because IO is done in bytes, not Unicode codepoints. When it tries to encode it in CP850 (apparently your system's default encoding judging by the traceback), it unsurprisingly fails (CP850 is an old Western Europe codec, which obviously can't encode an Asian character like the one in question). To signal that failure, it raises an exception, thus the error you see. This is intended behavior. Either change your default system/terminal encoding to one that can handle such characters or explicitly encode the string and use one of the provided options for dealing with unencodable characters. Also, please don't call it a "crash" as that's very misleading. The Python interpreter didn't dump core, an exception was merely thrown. There's a world of difference. Cheers, Chris -- Follow the path of the Iguana... http://rebertia.com -- http://mail.python.org/mailman/listinfo/python-list