On Jan 31, 2008, at 3:08 PM, DM Smith wrote: > I imagine there is a C/C++ routine that will convert from an entities > codepoint to a UTF-8 Character.
The numeric entities can presumably be interpreted as UTF-32 and encoded as UTF-8 on that basis using either ICU's routines or those in Sword. The one hangup might be if someone encodes UTF-16 surrogate pairs as entities. I'm not even sure whether that is legal, much less how likely it would be for someone to do. > I'm working on adding -n to osis2mod that will normalize UTF-8 to NFC. > There's a bug in it and I'll be posting separately about it. Are you using ICU? There's code in utf8nfc.cpp (in the filters directory) that should work to do the translation. We might even be able to use ICU to solve the surrogates issue with a little work. --Chris _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page