John Machin wrote:
On Oct 23, 3:03 pm, Ethan Furman <et...@stoneleaf.us> wrote:

John Machin wrote:

On Oct 23, 7:28 am, Ethan Furman <et...@stoneleaf.us> wrote:

Greetings, all!

I would like to add unicode support to my dbf project.  The dbf header
has a one-byte field to hold the encoding of the file.  For example,
\x03 is code-page 437 MS-DOS.

My google-fu is apparently not up to the task of locating a complete
resource that has a list of the 256 possible values and their
corresponding code pages.

What makes you imagine that all 256 possible values are mapped to code
pages?

I'm just wanting to make sure I have whatever is available, and
preferably standard.  :D


So far I have found this, plus variations:http://support.microsoft.com/kb/129631

Does anyone know of anything more complete?

That is for VFP3. Try the VFP9 equivalent.

dBase 5,5,6,7 use others which are not defined in publicly available
dBase docs AFAICT. Look for "language driver ID" and "LDID". Secondary
source: ESRI support site.

Well, a couple hours later and still not more than I started with.
Thanks for trying, though!


Huh? You got tips to (1) the VFP9 docs (2) the ESRI site (3) search
keywords and you couldn't come up with anything??

Perhaps "nothing new" would have been a better description. I'd already seen the clicketyclick site (good info there), and all I found at ESRI were folks trying to figure it out, plus one link to a list that was no different from the vfp3 list (or was it that the list did not give the hex values? Either way, of no use to me.)

I looked at dbase.com, but came up empty-handed there (not surprising, since they are a commercial company).

I searched some more on Microsoft's site in the VFP9 section, and was able to find the code page section this time. Sadly, it only added about seven codes.

At any rate, here is what I have come up with so far. Any corrections and/or additions greatly appreciated.

code_pages = {
    '\x01' : ('ascii', 'U.S. MS-DOS'),
    '\x02' : ('cp850', 'International MS-DOS'),
    '\x03' : ('cp1252', 'Windows ANSI'),
    '\x04' : ('mac_roman', 'Standard Macintosh'),
    '\x64' : ('cp852', 'Eastern European MS-DOS'),
    '\x65' : ('cp866', 'Russian MS-DOS'),
    '\x66' : ('cp865', 'Nordic MS-DOS'),
    '\x67' : ('cp861', 'Icelandic MS-DOS'),
    '\x68' : ('cp895', 'Kamenicky (Czech) MS-DOS'),     # iffy
    '\x69' : ('cp852', 'Mazovia (Polish) MS-DOS'),      # iffy
    '\x6a' : ('cp737', 'Greek MS-DOS (437G)'),
    '\x6b' : ('cp857', 'Turkish MS-DOS'),

    '\x78' : ('big5', 'Traditional Chinese (Hong Kong SAR, Taiwan)\
               Windows'),       # wag
    '\x79' : ('iso2022_kr', 'Korean Windows'),          # wag
    '\x7a' : ('iso2022_jp_2', 'Chinese Simplified (PRC, Singapore)\
               Windows'),       # wag
    '\x7b' : ('iso2022_jp', 'Japanese Windows'),        # wag
    '\x7c' : ('cp874', 'Thai Windows'),                 # wag
    '\x7d' : ('cp1255', 'Hebrew Windows'),
    '\x7e' : ('cp1256', 'Arabic Windows'),
    '\xc8' : ('cp1250', 'Eastern European Windows'),
    '\xc9' : ('cp1251', 'Russian Windows'),
    '\xca' : ('cp1254', 'Turkish Windows'),
    '\xcb' : ('cp1253', 'Greek Windows'),
    '\x96' : ('mac_cyrillic', 'Russian Macintosh'),
    '\x97' : ('mac_latin2', 'Macintosh EE'),
    '\x98' : ('mac_greek', 'Greek Macintosh') }

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to