Re: unicode and dbf files

Ethan Furman Mon, 26 Oct 2009 13:26:32 -0700

John Machin wrote:

On Oct 27, 3:22 am, Ethan Furman <et...@stoneleaf.us> wrote:

John Machin wrote:

Try this:
http://webhelp.esri.com/arcpad/8.0/referenceguide/


Wow.  Question, though:  all those codepages mapping to 437 and 850 --
are they really all the same?


437 and 850 *are* codepages. You mean "all those language driver IDs
mapping to codepages 437 and 850". A codepage merely gives an
encoding. An LDID is like a locale; it includes other things besides
the encoding. That's why many Western European languages map to the
same codepage, first 437 then later 850 then 1252 when Windows came
along.

Let me rephrase -- say I get a dbf file with an LDID of \x0f that mapsto a cp437, and the file came from a german oem machine... could thatfile have upper-ascii codes that will not map to anything reasonable onmy \x01 cp437 machine? If so, is there anything I can do about it?

   '\x68' : ('cp895', 'Kamenicky (Czech) MS-DOS'),     # iffy

Indeed iffy. Python doesn't have a cp895 encoding, and it's probably
not alone. I suggest that you omit Kamenicky until someone actually
wants it.


Yeah, I noticed that.  Tentative plan was to implement it myself (more
for practice than anything else), and also to be able to raise a more
specific error ("Kamenicky not currently supported" or some such).



The error idea is fine, but I don't get the "implement it yourself for
practice" bit ... practice what? You plan a long and fruitful career
inplementing codecs for YAGNI codepages?

ROFL. Playing with code; the unicode/code page interactions. Possiblylooking at constructs I might not otherwise. Since this would almostcertainly (I don't like saying "absolutely" and "never" -- beentroubleshooting for too many years for that!-) be a YAGNI, implementingit is very low priority

   '\x7b' : ('iso2022_jp', 'Japanese Windows'),        # wag

Try cp936.


You mean 932?



Yes.

Very helpful indeed.  Many thanks for reviewing and correcting.



You're welcome.

Learning to deal with unicode is proving more difficult for me than
learning Python was to begin with!  ;D



?? As far as I can tell, the topic has been about mapping from
something like a locale to the name of an encoding, i.e. all about the
pre-Unicode mishmash and nothing to do with dealing with unicode ...

You are, of course, correct. Once it's all unicode life will be easier(he says, all innocent-like). And dbf files even bigger, lol.

BTW, what are you planning to do with an LDID of 0x00?

Hmmm. Well, logical choices seem to be either treating it as plainascii, and barfing when high-ascii shows up; defaulting to \x01; orforcing the user to choose one on initial access.


I am definitely open to ideas!

Cheers,

John


--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode and dbf files

Reply via email to