On Mon, Apr 11, 2005 at 01:08:04PM -0700, gcomnz wrote: : I read "followed by 0 or more combining characters" to mean that it is : smart enough to combine the vowels in Arabic and other syllabic : alphabets that use special conjuncts. However I'm also not exactly : sure if that's even reasonably possible, or even if it makes sense in : the counting of "characters" for languages that use those.
The "0 or more combining characters" is relying on the exact definition of combining character in Unicode, which is construed as (somewhat) language-independent. But the language-dependent level can split up characters in whatever way makes sense to a native speaker of the language. That's what it's there for. But you actually have to declare up front what language you want to work in. Language-independent graphemes is the highest we can go by default, and that's where I think we should go by default, because that's closest to what the naïve user will expect. The smart people will know to drop to codepoints or bytes when they need that. Larry