According to http://crosswire.org/wiki/DevTools:ICU - Sword makes use of ICU for casing (used in search), normalization, and script transliteration.
*Which version of Unicode do we employ for Normalization to NFC ?* Some composite glyphs that use two combining characters in the *Myanmar* block are treated differently when specifying the current version of Unicode than they were for Unicode 3.2. These are the two combining characters. They have UNC codes U+1037 U+103A. ့ MYANMAR SIGN DOT BELOW ် MYANMAR SIGN ASAT This pair of combining characters occurs many, many times in the BurJudson module. Software that includes Normalization should be tested against the official Unicode Normalization Test http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt (2.2MB) for that version of Unicode, Testing the normalization of the sequence U+1000 U+103A U+1037 with the ICU Normalization Browser (which uses the "Internationalization Components for Unicode" library, which is the most widely used Unicode software library), we can verify that it does indeed normalize to U+1000 U+1037 U+103A, with reordering: See http://bit.ly/nqYzQp. However, if you run the same test for Unicode 3.2 (released March 2002, and so almost 10 years out of date), there is no reordering: See http://bit.ly/orZ7df. /NB. I used the URL shortener to allow parameters to be passed to the test page more easily/. The process of converting a string to NFC or NFD requires a stage called "canonical ordering", whereby characters are reordered in ascending order according to their canonical combining class [ccc]. See http://www.unicode.org/reports/tr15/?win#Description_Norm. U+103A MYANMAR SIGN ASAT has ccc=9, whereas U+1037 MYANMAR SIGN DOT BELOW has ccc=7; therefore U+1037 is reordered before U+103A. The present module BurJudson has SwordVersionDate=2008-03-01. It looks very much as if the normalization was done according to Unicode 3.2. Context: This question arises in the context of the possibility of creating a new module from a better source text. If we use the latest SWORD utilities to make the new module, will it normalize correctly? David -- View this message in context: http://sword-dev.350566.n4.nabble.com/DevTools-ICU-Normalization-tp3898398p3898398.html Sent from the SWORD Dev mailing list archive at Nabble.com. _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page