On Sep 5, 2014 7:57 PM, "Kurt Mueller" <kurt.alfred.muel...@gmail.com> wrote: > Could someone please explain the following behavior to me: > Python 2.7.7, MacOS 10.9 Mavericks > > >>> import sys > >>> sys.getdefaultencoding() > 'ascii' > >>> [ord(c) for c in 'AÄ'] > [65, 195, 132] > >>> [ord(c) for c in u'AÄ'] > [65, 196] > > My obviously wrong understanding: > ‚AÄ‘ in ‚ascii‘ are two characters > one with ord A=65 and > one with ord Ä=196 ISO8859-1 <depends on code table> > —-> why [65, 195, 132] > u’AÄ’ is an Unicode string > —-> why [65, 196] > > It is just the other way round as I would expect.
Basically, the first string is just a bunch of bytes, as provided by your terminal — which sounds like UTF-8 (perfectly logical in 2014). The second one is converted into a real Unicode representation. The codepoint for Ä is U+00C4 (196 decimal). It's just a coincidence that it also matches latin1 aka ISO 8859-1 as Unicode starts with all 256 latin1 codepoints. Please kindly forget encodings other than UTF-8. BTW: ASCII covers only the first 128 bytes. -- Chris “Kwpolska” Warrick <http://chriswarrick.com/> Sent from my Galaxy S3.
-- https://mail.python.org/mailman/listinfo/python-list