Re: String reverse kernel

2021-05-17 Thread Jonathan Keane
Yeah, piggybacking on what Weston said: is the line that we want to draw is code point, combining character sequences, or graphemes [1]. IME, most people would want/assume that combining characters would stay combined in reversals (using Weston's example: "tréma" becoming "aḿert" (though this spe

Re: String reverse kernel

2021-05-17 Thread Weston Pace
FWIW, combining marks were not actually added to support emojis. Emojis are just one of the more popular uses of the feature. Combining marks is a standard Unicode feature necessary to represent single “characters” in some complex situations (e.g. when it is necessary to distinguish between tréma

Re: String reverse kernel

2021-05-17 Thread Niranda Perera
Thank you very much for your inputs, guys. So, based on the discussion, I will make the following changes. 1. ASCII reverse would throw an error when a non-ASCII (valid/ invalid utf8) byte is oThank you @antoinebserved (no change) 2. UTF8 kernel would return a garbage output when an invalid utf8 c

Re: String reverse kernel

2021-05-17 Thread Antoine Pitrou
I'm fine with pointing out that the function operates on codepoints. Linking to the Unicode documentation for emojis sounds entirely like a distraction, though. Regards Antoine. Le 17/05/2021 à 17:28, Ian Cook a écrit : +1 for clarifying this in the kernel documentation, referring to the

Re: String reverse kernel

2021-05-17 Thread Ian Cook
+1 for clarifying this in the kernel documentation, referring to these multi-emoji glyphs as "emoji ZWJ sequences," and linking to https://unicode.org/emoji/charts/emoji-zwj-sequences.html Ian On Mon, May 17, 2021 at 11:21 AM Antoine Pitrou wrote: > > > Le 17/05/2021 à 17:17, David Li a écrit :

Re: String reverse kernel

2021-05-17 Thread David Li
Sure, that is a fair point. But in this case Unicode defines both codepoint and (extended) grapheme cluster, so I felt it might be worth including a quick note about which one is being reversed (though to be fair, nearly every language picks codepoint except maybe Swift, IIUC). In either case i

Re: String reverse kernel

2021-05-17 Thread Antoine Pitrou
Le 17/05/2021 à 17:17, David Li a écrit : A little clarification on my point: it's not that a single codepoint gets encoded with more than four bytes, it's that a grapheme cluster/human-delimited 'character' might be multiple codepoints, so reversing the individual codepoints may produce an une

Re: String reverse kernel

2021-05-17 Thread David Li
nel documentation so people know what to expect. -David On 2021/05/17 14:48:52, Antoine Pitrou wrote: > > Le 17/05/2021 à 16:28, Niranda Perera a écrit : > > Hi all, > > > > This is RE: [1] & [2] String reverse kernel. Even though it is a seemingly > > trivial

Re: String reverse kernel

2021-05-17 Thread Antoine Pitrou
Le 17/05/2021 à 16:28, Niranda Perera a écrit : Hi all, This is RE: [1] & [2] String reverse kernel. Even though it is a seemingly trivial exercise, I would like to clarify a few things. In the current PR [1], there are 2 reverse kernels, ASCII and UTF8. I'd like to get some feedbac

String reverse kernel

2021-05-17 Thread Niranda Perera
Hi all, This is RE: [1] & [2] String reverse kernel. Even though it is a seemingly trivial exercise, I would like to clarify a few things. In the current PR [1], there are 2 reverse kernels, ASCII and UTF8. I'd like to get some feedback for the following points. 1. For ASCII reve