On Tuesday 02 September 2008 19:12:21 Changwoo Ryu wrote: > 2008-09-02 (화), 13:19 -0500, Adam Majer: > > Maybe it is just my misunderstanding of UTF-8, I'm not sure. But at > > least my expected behaviour was being able to select 1 UTF-8 character > > at a time, even if linguistically it does not make any sense. > > The Tibetan code in this case, U+0FA1 is NOT a character. It's a Tibetan > code for combining with other Tibetan codes to form a Tibetan character. > Unicode code points do not necessarily represent characters. Selecting > combined character is more expected than selecting its sub-parts (even > when it's possible). > > This issue is about handling Unicode combining. In this case, Pango > interprets a quote mark (") and U+0FA1 Tibetan code (wrong combination) > as one combined character. I'm not sure whether it's a defined behavior. I did a bit of searching, and the selection behaviour seen makes sense, I don't know if using Tibetan combining marks on non-Tibetan characters is allowed. Basically, one has to be careful about the definition of 'character' that is used. http://www.unicode.org/faq/char_combmark.html#2 http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
-- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]