Thank you very much for your inputs, guys. So, based on the discussion, I will make the following changes.
1. ASCII reverse would throw an error when a non-ASCII (valid/ invalid utf8) byte is oThank you @antoinebserved (no change) 2. UTF8 kernel would return a garbage output when an invalid utf8 char is observed but (no change) Thank you @antoine for the clarification. 3. Edit documentation to clarify that the kernel works on code-point level On Mon, May 17, 2021 at 11:31 AM Antoine Pitrou <anto...@python.org> wrote: > > I'm fine with pointing out that the function operates on codepoints. > > Linking to the Unicode documentation for emojis sounds entirely like a > distraction, though. > > Regards > > Antoine. > > > Le 17/05/2021 à 17:28, Ian Cook a écrit : > > +1 for clarifying this in the kernel documentation, referring to these > > multi-emoji glyphs as "emoji ZWJ sequences," and linking to > > https://unicode.org/emoji/charts/emoji-zwj-sequences.html > > > > Ian > > > > > > On Mon, May 17, 2021 at 11:21 AM Antoine Pitrou <anto...@python.org> > wrote: > >> > >> > >> Le 17/05/2021 à 17:17, David Li a écrit : > >>> A little clarification on my point: it's not that a single codepoint > >>> gets encoded with more than four bytes, it's that a grapheme > >>> cluster/human-delimited 'character' might be multiple codepoints, so > >>> reversing the individual codepoints may produce an unexpected > >>> result. For instance a flag emoji is actually two codepoints (two > >>> special 'letter' codepoints that represent the country code), so > >>> reversing a US flag naively will give you an odd '[SU]' instead. > >> > >> This sounds like saying that reversing a valid French word does not > >> produce a valid French word (well, in most cases). The kernel > >> documentation can't contain an entire tutorial about Unicode characters > >> and what to expect from them, IMHO. > >> > >> Regards > >> > >> Antoine. > -- Niranda Perera https://niranda.dev/ @n1r44 <https://twitter.com/N1R44>