Re: the unicode saga continues...

Mark Tolonen Fri, 13 Nov 2009 22:55:22 -0800

"Ethan Furman" <[email protected]> wrote in messagenews:[email protected]...

So I've added unicode support to my dbf package, but I also have somerather large programs that aren't ready to make the switch over yet. Soas a workaround I added a (rather lame) option to convert theunicode-ified data that was decoded from the dbf table back into anencoded format.
Here's the fun part: in figuring out what the option should be for usewith my system, I tried some tests...
Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print u'\xed'
í
>>> print u'\xed'.encode('cp437')
í
>>> print u'\xed'.encode('cp850')
í
>>> print u'\xed'.encode('cp1252')
φ
>>> import locale
>>> locale.getdefaultlocale()
('en_US', 'cp1252')
My confusion lies in my apparant codepage (cp1252), and the discrepancywith character u'\xed' which is absolutely an i with an accent; yet when Iencode with cp1252 and print it, I get an o with a line.
Can anybody clue me in to what's going on here?

Yes, your console window actually uses cp437, cp850 happens to map to thesame character, and cp1252 does not. cp1252 is the default Windows encoding(what Notepad uses, for example):

Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)]on

win32
Type "help", "copyright", "credits" or "license" for more information.

import locale
locale.getdefaultlocale()

('en_US', 'cp1252')

import sys
sys.stdout.encoding

'cp437'

print u'\xed'.encode('cp437')

í

print u'\xed'.encode('cp1252')

φ

-Mark


--
http://mail.python.org/mailman/listinfo/python-list

Re: the unicode saga continues...

Reply via email to