Re: [dev] [libgrapheme] announcement

sylvain . bertrand Fri, 27 Mar 2020 15:27:09 -0700

On Fri, Mar 27, 2020 at 10:24:52PM +0100, Laslo Hunhold wrote:
> ... This will cover 99.5% of all cases...


What do you mean? They managed to add in grapheme cluster definition some weird
edge cases up to 0.5%??

About string comparison: if I recall well, after utf-8 normalization (n11n), 
strings
are supposed to be 100% perfect for comparison byte per byte.

The more you know: utf-8 n11n got its way in linux filesystems support, and
that quite recently. This will become a problem for terminal based
applications. In near future gnu/linux distros, the filenames will become
normalized using the "right way"(TM) n11n.

This "right way"(TM) n11n (there are 2 n11ns) produces only non-pre-composed
grapheme cluster of codepoints (but in the CJK realm, there are exceptions if I
recall properly). AFAIK, all terminal based applications do expect
"pre-composed" grapheme codepoint.

For instance the french letter 'è' won't be 1 codepoint anymore, but 'e' + '`'
(I don't recall the n11n order), namely a sequence of 2 codepoints.

I am a bit scared because software like ncurses, lynx, links, vim, may use the
abominations of software we discussed earlier to handle all this.

-- 
Sylvain

Re: [dev] [libgrapheme] announcement

Reply via email to