On Tuesday 02 September 2008 19:12:21 Changwoo Ryu wrote:
> 2008-09-02 (화), 13:19 -0500, Adam Majer:
> > Maybe it is just my misunderstanding of UTF-8, I'm not sure. But at
> > least my expected behaviour was being able to select 1 UTF-8 character
> > at a time, even if linguistically it does not make any sense.
>
> The Tibetan code in this case, U+0FA1 is NOT a character. It's a Tibetan
> code for combining with other Tibetan codes to form a Tibetan character.
> Unicode code points do not necessarily represent characters. Selecting
> combined character is more expected than selecting its sub-parts (even
> when it's possible).
>
> This issue is about handling Unicode combining. In this case, Pango
> interprets a quote mark (") and U+0FA1 Tibetan code (wrong combination)
> as one combined character. I'm not sure whether it's a defined behavior.
 I did a bit of searching, and the selection behaviour seen makes sense, I 
don't know if using Tibetan combining marks on non-Tibetan characters is 
allowed.
 Basically, one has to be careful about the definition of 'character' that is 
used.
 http://www.unicode.org/faq/char_combmark.html#2
 http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to