On Mon, Apr 11, 2005 at 01:08:04PM -0700, gcomnz wrote:
: I read "followed by 0 or more combining characters" to mean that it is
: smart enough to combine the vowels in Arabic and other syllabic
: alphabets that use special conjuncts. However I'm also not exactly
: sure if that's even reasonably possible, or even if it makes sense in
: the counting of "characters" for languages that use those.

The "0 or more combining characters" is relying on the exact
definition of combining character in Unicode, which is construed as
(somewhat) language-independent.  But the language-dependent level
can split up characters in whatever way makes sense to a native
speaker of the language.  That's what it's there for.  But you
actually have to declare up front what language you want to work in.
Language-independent graphemes is the highest we can go by default,
and that's where I think we should go by default, because that's
closest to what the naïve user will expect.  The smart people will
know to drop to codepoints or bytes when they need that.

Larry

Reply via email to