+1 for clarifying this in the kernel documentation, referring to these multi-emoji glyphs as "emoji ZWJ sequences," and linking to https://unicode.org/emoji/charts/emoji-zwj-sequences.html
Ian On Mon, May 17, 2021 at 11:21 AM Antoine Pitrou <anto...@python.org> wrote: > > > Le 17/05/2021 à 17:17, David Li a écrit : > > A little clarification on my point: it's not that a single codepoint > > gets encoded with more than four bytes, it's that a grapheme > > cluster/human-delimited 'character' might be multiple codepoints, so > > reversing the individual codepoints may produce an unexpected > > result. For instance a flag emoji is actually two codepoints (two > > special 'letter' codepoints that represent the country code), so > > reversing a US flag naively will give you an odd '[SU]' instead. > > This sounds like saying that reversing a valid French word does not > produce a valid French word (well, in most cases). The kernel > documentation can't contain an entire tutorial about Unicode characters > and what to expect from them, IMHO. > > Regards > > Antoine.