Marc 'BlackJack' Rintsch wrote: > The `unicode()` call doesn't fail here but the ``print`` because printing > unicode strings means they have to be encoded into a byte string again. > And whatever encoding the target of the print (your console) uses, it > does not contain the unicode character u'\xd8'. From the traceback it > seems your terminal uses `cp437` as encoding. > > As you can see here: http://www.wordiq.com/definition/CP437 there's no Ø > in that character set.
somethings are much, much, clearer to me now. thanks! For future reference, these links may also help: http://www.jorendorff.com/articles/unicode/python.html http://www.thescripts.com/forum/thread23314.html I've changed my windows console copdepage to latin1 and the following prints are now outputting "França", as expected: print unicode("Fran\x87a", "cp850").encode("iso-8859-1") print unicode("Fran\xe7a", "iso-8859-1").encode("iso-8859-1") However, I don't yet fully understand what's happening with Pymssql. The encoding I expected to be receiving from MSSqlServer was cp850 (the column in question uses the collation SQL_Latin1_General_CP850_CS_AS), but it doesn't seem to be what the query is returning. I tried converting to a unicode string from a few different encodings, but none of them seems to be the right one. For example, for cp850, using a latin1 console: -------------------------------------------------------- term = unicode(row[1], "cp850") print repr(term) print term ---- output ------------------------------------------- u'Fran\xcfa' FranÏa -------------------------------------------------------- And for iso-8859-1 (also got the same result for mbcs): -------------------------------------------------------- term = unicode(row[1], "iso-8859-1") print repr(term) print term ---- output ------------------------------------------- u'Fran\xd8a' FranØa -------------------------------------------------------- What do you think? Might it be Pymssql doing something wrong? -- http://mail.python.org/mailman/listinfo/python-list