On Thu, Jul 26, 2012 at 6:49 AM, Chris Little <chris...@crosswire.org> wrote: > Has anyone ever used the -e switch of usfm2osis.pl to do character encoding > conversion on USFM docs as they're being converted to OSIS? > > I'm doing the Python rewrite of usfm2osis and wondered whether I can safely > dump this functionality. It shouldn't be difficult to implement, but it the > usage statement would be much cleaner without it. Personally, I would likely > use uconv to change encoding as a preprocessing step, but if anyone actually > desires to keep this in the markup conversion script, I'll include it.
In Python it should be as trivial as a single line of code - or possibly two, no? utf_input = input.decode(src_encoding) Possibly followed by enc_output = utf_input.encode(destination_encoding) Provided the source encoding is known to Python it ought to be straightforward for the conversion. It is also possible to allow the user to specify a manual encoding conversion routine if the source encoding is unknown to Python. I've had to do this before when working with files that use very strange or custom encodings. Probably more work than is helpful if no one is using it, but it's worth keeping in consideration. --Greg > > --Chris > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page