On 06/08/12 14:20, Chris Little wrote:
Linux packagers apparently go the UCS-4 route, so I didn't notice any issue with using the Language Tags. But trying the above on Windows shows that the cygwin build and the builds from python.org (2.7 & 3.2) all use UCS-2. So my script won't work correctly on Windows.

Not to worry, though. I'll just replace the Language Tags with Noncharacters in the range u+FDD0-u+FDEF. They're UCS-2-safe since they're BMP codepoints and they're specifically designated as "intended for process-internal uses, but are not permitted for interchange." So in the unlikely event that they appear in input, it's the fault of the USFM-encoder if anything goes awry.

We'll have to watch for input outside of the BMP on UCS-2 Python, though, as that could cause problems.
I guess I'm quite surprised that you wrote a new Python program using Python2 when its development is basically coming to an end (and the next Ubuntu will no longer have it installed by default). I also wonder if Python3 would handle Unicode better.

(I've been writing all new code in Python3 for the last couple of years now.)

Robert.

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to