"Ethan Furman" <et...@stoneleaf.us> wrote in message news:4afe4141.4020...@stoneleaf.us...
So I've added unicode support to my dbf package, but I also have some rather large programs that aren't ready to make the switch over yet. So as a workaround I added a (rather lame) option to convert the unicode-ified data that was decoded from the dbf table back into an encoded format.

Here's the fun part: in figuring out what the option should be for use with my system, I tried some tests...

Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print u'\xed'
í
>>> print u'\xed'.encode('cp437')
í
>>> print u'\xed'.encode('cp850')
í
>>> print u'\xed'.encode('cp1252')
φ
>>> import locale
>>> locale.getdefaultlocale()
('en_US', 'cp1252')

My confusion lies in my apparant codepage (cp1252), and the discrepancy with character u'\xed' which is absolutely an i with an accent; yet when I encode with cp1252 and print it, I get an o with a line.

Can anybody clue me in to what's going on here?

Yes, your console window actually uses cp437, cp850 happens to map to the same character, and cp1252 does not. cp1252 is the default Windows encoding (what Notepad uses, for example):

Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
import locale
locale.getdefaultlocale()
('en_US', 'cp1252')
import sys
sys.stdout.encoding
'cp437'
print u'\xed'.encode('cp437')
í
print u'\xed'.encode('cp1252')
φ

-Mark

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to