This is just to record an observation I made when tinkering with diatheke the
other day.

Algorithmic transliteration by ICU does not replace Armenian punctuation
marks.

To illustrate, the character counts of these remain unchanged compared to
the original text.

U+055B  ՛       11,553  ARMENIAN EMPHASIS MARK
U+055C  ՜       449             ARMENIAN EXCLAMATION MARK
U+055D  ՝       70,737  ARMENIAN COMMA
U+055E  ՞       3,522           ARMENIAN QUESTION MARK
U+0589  ։       30,366  ARMENIAN FULL STOP
U+058A  ֊       1,126           ARMENIAN HYPHEN
U+2024  ․       6               ONE DOT LEADER (used as the Armenian semicolon)


Clearly this is an upstream issue for ICU, but as I'm layers removed from
that, I thought it worthwhile at least recording here, on the off chance
that someone more involved might care to take it up.

cf. For comparison, I recently did a similar exercise with the Gurmukhi
script (for the Punjabi language), and was very pleased to observe that
Gurmukhi punctuation marks were suitably replaced by those we use in English
and other Latin script languages.

As for what replacements should be used, most of these are obvious from the
Unicode character names.
The only one that needs a more well informed choice is the first, what to
use for the emphasis mark. 
This being a phonetic construct, my own suggestion would be to use 
U+02B9  ʹ       MODIFIER LETTER PRIME
though there may turn out to be something more appropriate.

Best regards,

David




--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/Algorithmic-transliteration-by-ICU-and-Armenian-punctuation-marks-tp4656909.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to