On 6 jan, 11:03, Ivan <i...@llaisdy.com> wrote: > Dear All > > I'm developing a python application for which I need to support a > non-standard character encoding (specifically ISO 6937/2-1983, Addendum > 1-1989). Here are some of the properties of the encoding and its use in > the application: > > - I need to read and write data to/from files. The file format > includes two sections in different character encodings (so I > shan't be able to use codecs.open()). > > - iso-6937 sections include non-printing control characters > > - iso-6937 is a variable width encoding, e.g. "A" = [41], > "Ä" = [0xC8, 0x41]; all non-spacing diacritical marks are in the > range 0xC0-0xCF. > > By any chance is there anyone out there working on iso-6937? > > Otherwise, I think I need to write a new codec to support reading and > writing this data. Does anyone know of any tutorials or blog posts on > implementing a codec for a non-standard characeter encoding? Would > anyone be interested in reading one? >
Take a look at the files, Python modules, in the ...\Lib\encodings. This is the place where all codecs are centralized. Python is magically using these a long there are present in that dir. I remember, long time ago, for the fun, I created such a codec quite easily. I picked up one of the file as template and I modified its "table". It was a byte <-> byte table. For multibytes coding scheme, it may be a litte bit more complicated; you may take a look, eg, at the mbcs.py codec. The distibution of such a codec may be a problem. ---- Another simple approach, os independent. You probably do not write your code in iso-6937, but you only need to encode/decode some bytes sequence "on the fly". In that case, work with bytes, create a couple of coding / decoding functions with a created <dict> [*] as helper. It's not so complicate. Use <unicode> Py2 or <str> Py3 (the recommended way ;-) ) as pivot encoding. [*] I also created once a such a dict from # http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt I never checked if it does correpond to the "official" cp1252 codec. jmf -- http://mail.python.org/mailman/listinfo/python-list