Am 05.09.2014 um 21:16 schrieb Kurt Mueller <kurt.alfred.muel...@gmail.com>: > Am 05.09.2014 um 20:25 schrieb Chris “Kwpolska” Warrick <kwpol...@gmail.com>: >> On Sep 5, 2014 7:57 PM, "Kurt Mueller" <kurt.alfred.muel...@gmail.com> wrote: >>> Could someone please explain the following behavior to me: >>> Python 2.7.7, MacOS 10.9 Mavericks >>> >>>>>> import sys >>>>>> sys.getdefaultencoding() >>> 'ascii' >>>>>> [ord(c) for c in 'AÄ'] >>> [65, 195, 132] >>>>>> [ord(c) for c in u'AÄ'] >>> [65, 196] >>> >>> My obviously wrong understanding: >>> ‚AÄ‘ in ‚ascii‘ are two characters >>> one with ord A=65 and >>> one with ord Ä=196 ISO8859-1 <depends on code table> >>> —-> why [65, 195, 132] >>> u’AÄ’ is an Unicode string >>> —-> why [65, 196] >>> >>> It is just the other way round as I would expect. >> >> Basically, the first string is just a bunch of bytes, as provided by your >> terminal — which sounds like UTF-8 (perfectly logical in 2014). The second >> one is converted into a real Unicode representation. The codepoint for Ä is >> U+00C4 (196 decimal). It's just a coincidence that it also matches latin1 >> aka ISO 8859-1 as Unicode starts with all 256 latin1 codepoints. Please >> kindly forget encodings other than UTF-8. > > So: > ‘AÄ’ is an UTF-8 string represented by 3 bytes: > A -> 41 -> 65 first byte decimal > Ä -> c384 -> 195 and 132 second and third byte decimal > > u’AÄ’ is an Unicode string represented by 2 bytes?: > A -> U+0041 -> 65 first byte decimal, 00 is omitted or not yielded by ord()? > Ä -> U+00C4 -> 196 second byte decimal, 00 is ommited or not yielded by ord()?
After reading the ord() manual: The second case should read: u’AÄ’ is an Unicode string represented by 2 unicode characters: If Python was built with UCS2 Unicode, then the character’s code point must be in the range [0..65535, 16 bits, U-0000..U-FFFF] A -> U+0041 -> 65 first character decimal (code point) Ä -> U+00C4 -> 196 second character decimal (code point) Am I right now? -- Kurt Mueller, kurt.alfred.muel...@gmail.com -- https://mail.python.org/mailman/listinfo/python-list