On Dec 14, 8:07 am, "Chris Rebert" <c...@rebertia.com> wrote: > On Sat, Dec 13, 2008 at 12:28 PM, John Machin <sjmac...@lexicon.net> wrote: > > > Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit > > (Intel)] on win32 > > Type "help", "copyright", "credits" or "license" for more information. > >>>> x = u'\u9876' > >>>> x > > u'\u9876' > > > # As expected > > > Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit > > (Intel)] on win 32 > > Type "help", "copyright", "credits" or "license" for more information. > >>>> x = '\u9876' > >>>> x > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File "C:\python30\lib\io.py", line 1491, in write > > b = encoder.encode(s) > > File "C:\python30\lib\encodings\cp850.py", line 19, in encode > > return codecs.charmap_encode(input,self.errors,encoding_map)[0] > > UnicodeEncodeError: 'charmap' codec can't encode character '\u9876' in > > position > > 1: character maps to <undefined> > > > # *NOT* as expected (by me, that is) > > > Is this the intended outcome? > > When Python tries to display the character, it must first encode it > because IO is done in bytes, not Unicode codepoints. When it tries to > encode it in CP850 (apparently your system's default encoding judging > by the traceback), it unsurprisingly fails (CP850 is an old Western > Europe codec, which obviously can't encode an Asian character like the > one in question). To signal that failure, it raises an exception, thus > the error you see. > This is intended behavior.
I see. That means that the behaviour in Python 1.6 to 2.6 (i.e. encoding the text using the repr() function (as then defined) was not intended behaviour? > Either change your default system/terminal > encoding to one that can handle such characters or explicitly encode > the string and use one of the provided options for dealing with > unencodable characters. You are missing the point. I don't care about the visual representation. What I care about is an unambiguous representation that can be used when communicating about problems across cultures/ networks/mail-clients/news-readers ... the sort of problems that are initially advised as "I got this UnicodeEncodeError" and accompanied by no data or garbled data. > Also, please don't call it a "crash" as that's very misleading. The > Python interpreter didn't dump core, an exception was merely thrown. "spew nonsense on the screen and then stop" is about as useful and as astonishing as "dump core". core? You mean like ferrite doughnuts on a wire trellis? I thought that went out of fashion before cp850 was invented :-) -- http://mail.python.org/mailman/listinfo/python-list