Hi folks, what semmingly started out as a weird database character encoding mix-up could be boiled down to a few lines of pure Python. The source-code below is real utf8 (as evidenced by the UTF code point 'c3 a4' in the third line of the hexdump). When just printed, the string "s" is displayed correctly as 'ä' (a umlaut), but the string representation shows that it seems to have been converted to latin-1 'e4' somewhere on the way. How can this be avoided?
dh@jenna:~/python$ cat unicode.py # -*- encoding: utf8 -*- s = u'ä' print(s) print((s, )) dh@jenna:~/python$ hd unicode.py 00000000 23 20 2d 2a 2d 20 65 6e 63 6f 64 69 6e 67 3a 20 |# -*- encoding: | 00000010 75 74 66 38 20 2d 2a 2d 0a 0a 73 20 3d 20 75 27 |utf8 -*-..s = u'| 00000020 c3 a4 27 0a 0a 70 72 69 6e 74 28 73 29 0a 70 72 |..'..print(s).pr| 00000030 69 6e 74 28 28 73 2c 20 29 29 0a 0a |int((s,))..| 0000003c dh@jenna:~/python$ python unicode.py ä (u'\xe4',) dh@jenna:~/python$ -- https://mail.python.org/mailman/listinfo/python-list