On 06/08/12 14:20, Chris Little wrote:
Linux packagers apparently go the UCS-4 route, so I didn't notice any
issue with using the Language Tags. But trying the above on Windows
shows that the cygwin build and the builds from python.org (2.7 & 3.2)
all use UCS-2. So my script won't work correctly on Windows.
Not to worry, though. I'll just replace the Language Tags with
Noncharacters in the range u+FDD0-u+FDEF. They're UCS-2-safe since
they're BMP codepoints and they're specifically designated as
"intended for process-internal uses, but are not permitted for
interchange." So in the unlikely event that they appear in input, it's
the fault of the USFM-encoder if anything goes awry.
We'll have to watch for input outside of the BMP on UCS-2 Python,
though, as that could cause problems.
I guess I'm quite surprised that you wrote a new Python program using
Python2 when its development is basically coming to an end (and the next
Ubuntu will no longer have it installed by default). I also wonder if
Python3 would handle Unicode better.
(I've been writing all new code in Python3 for the last couple of years
now.)
Robert.
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page