+1 for clarifying this in the kernel documentation, referring to these
multi-emoji glyphs as "emoji ZWJ sequences," and linking to
https://unicode.org/emoji/charts/emoji-zwj-sequences.html

Ian


On Mon, May 17, 2021 at 11:21 AM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Le 17/05/2021 à 17:17, David Li a écrit :
> > A little clarification on my point: it's not that a single codepoint
> > gets encoded with more than four bytes, it's that a grapheme
> > cluster/human-delimited 'character' might be multiple codepoints, so
> > reversing the individual codepoints may produce an unexpected
> > result. For instance a flag emoji is actually two codepoints (two
> > special 'letter' codepoints that represent the country code), so
> > reversing a US flag naively will give you an odd '[SU]' instead.
>
> This sounds like saying that reversing a valid French word does not
> produce a valid French word (well, in most cases). The kernel
> documentation can't contain an entire tutorial about Unicode characters
> and what to expect from them, IMHO.
>
> Regards
>
> Antoine.

Reply via email to