I'm fine with pointing out that the function operates on codepoints.
Linking to the Unicode documentation for emojis sounds entirely like a
distraction, though.
Regards
Antoine.
Le 17/05/2021 à 17:28, Ian Cook a écrit :
+1 for clarifying this in the kernel documentation, referring to these
multi-emoji glyphs as "emoji ZWJ sequences," and linking to
https://unicode.org/emoji/charts/emoji-zwj-sequences.html
Ian
On Mon, May 17, 2021 at 11:21 AM Antoine Pitrou <anto...@python.org> wrote:
Le 17/05/2021 à 17:17, David Li a écrit :
A little clarification on my point: it's not that a single codepoint
gets encoded with more than four bytes, it's that a grapheme
cluster/human-delimited 'character' might be multiple codepoints, so
reversing the individual codepoints may produce an unexpected
result. For instance a flag emoji is actually two codepoints (two
special 'letter' codepoints that represent the country code), so
reversing a US flag naively will give you an odd '[SU]' instead.
This sounds like saying that reversing a valid French word does not
produce a valid French word (well, in most cases). The kernel
documentation can't contain an entire tutorial about Unicode characters
and what to expect from them, IMHO.
Regards
Antoine.