On 25/11/2018 18:51, Robert Latest via Python-list wrote: > Hi folks, > what semmingly started out as a weird database character encoding mix-up > could be boiled down to a few lines of pure Python. The source-code > below is real utf8 (as evidenced by the UTF code point 'c3 a4' in the > third line of the hexdump). When just printed, the string "s" is > displayed correctly as 'ä' (a umlaut), but the string representation > shows that it seems to have been converted to latin-1 'e4' somewhere on > the way.
It's not being converted to latin-1. It's a unicode string, as evidences by the 'u'. u'\xe4' is a unicode string with one character, U+00E4 (ä) > How can this be avoided? > > dh@jenna:~/python$ cat unicode.py > # -*- encoding: utf8 -*- > > s = u'ä' > > print(s) > print((s, )) > > dh@jenna:~/python$ hd unicode.py > 00000000 23 20 2d 2a 2d 20 65 6e 63 6f 64 69 6e 67 3a 20 |# -*- encoding: | > 00000010 75 74 66 38 20 2d 2a 2d 0a 0a 73 20 3d 20 75 27 |utf8 -*-..s = u'| > 00000020 c3 a4 27 0a 0a 70 72 69 6e 74 28 73 29 0a 70 72 |..'..print(s).pr| > 00000030 69 6e 74 28 28 73 2c 20 29 29 0a 0a |int((s,))..| > 0000003c > dh@jenna:~/python$ python unicode.py > ä > (u'\xe4',) > dh@jenna:~/python$ > > > -- https://mail.python.org/mailman/listinfo/python-list