This is just to record an observation I made when tinkering with diatheke the other day.
Algorithmic transliteration by ICU does not replace Armenian punctuation marks. To illustrate, the character counts of these remain unchanged compared to the original text. U+055B ՛ 11,553 ARMENIAN EMPHASIS MARK U+055C ՜ 449 ARMENIAN EXCLAMATION MARK U+055D ՝ 70,737 ARMENIAN COMMA U+055E ՞ 3,522 ARMENIAN QUESTION MARK U+0589 ։ 30,366 ARMENIAN FULL STOP U+058A ֊ 1,126 ARMENIAN HYPHEN U+2024 ․ 6 ONE DOT LEADER (used as the Armenian semicolon) Clearly this is an upstream issue for ICU, but as I'm layers removed from that, I thought it worthwhile at least recording here, on the off chance that someone more involved might care to take it up. cf. For comparison, I recently did a similar exercise with the Gurmukhi script (for the Punjabi language), and was very pleased to observe that Gurmukhi punctuation marks were suitably replaced by those we use in English and other Latin script languages. As for what replacements should be used, most of these are obvious from the Unicode character names. The only one that needs a more well informed choice is the first, what to use for the emphasis mark. This being a phonetic construct, my own suggestion would be to use U+02B9 ʹ MODIFIER LETTER PRIME though there may turn out to be something more appropriate. Best regards, David -- View this message in context: http://sword-dev.350566.n4.nabble.com/Algorithmic-transliteration-by-ICU-and-Armenian-punctuation-marks-tp4656909.html Sent from the SWORD Dev mailing list archive at Nabble.com. _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page