Yeah, piggybacking on what Weston said: is the line that we want to draw is
code point, combining character sequences, or graphemes [1]. IME, most
people would want/assume that combining characters would stay combined in
reversals (using Weston's example: "tréma" becoming "aḿert" (though this
spe
FWIW, combining marks were not actually added to support emojis. Emojis
are just one of the more popular uses of the feature. Combining marks is a
standard Unicode feature necessary to represent single “characters” in some
complex situations (e.g. when it is necessary to distinguish between tréma
Thank you very much for your inputs, guys. So, based on the discussion, I
will make the following changes.
1. ASCII reverse would throw an error when a non-ASCII (valid/ invalid
utf8) byte is oThank you @antoinebserved (no change)
2. UTF8 kernel would return a garbage output when an invalid utf8 c
I'm fine with pointing out that the function operates on codepoints.
Linking to the Unicode documentation for emojis sounds entirely like a
distraction, though.
Regards
Antoine.
Le 17/05/2021 à 17:28, Ian Cook a écrit :
+1 for clarifying this in the kernel documentation, referring to the
+1 for clarifying this in the kernel documentation, referring to these
multi-emoji glyphs as "emoji ZWJ sequences," and linking to
https://unicode.org/emoji/charts/emoji-zwj-sequences.html
Ian
On Mon, May 17, 2021 at 11:21 AM Antoine Pitrou wrote:
>
>
> Le 17/05/2021 à 17:17, David Li a écrit :
Sure, that is a fair point. But in this case Unicode defines both codepoint and
(extended) grapheme cluster, so I felt it might be worth including a quick note
about which one is being reversed (though to be fair, nearly every language
picks codepoint except maybe Swift, IIUC).
In either case i
Le 17/05/2021 à 17:17, David Li a écrit :
A little clarification on my point: it's not that a single codepoint
gets encoded with more than four bytes, it's that a grapheme
cluster/human-delimited 'character' might be multiple codepoints, so
reversing the individual codepoints may produce an une
nel documentation so people know what
to expect.
-David
On 2021/05/17 14:48:52, Antoine Pitrou wrote:
>
> Le 17/05/2021 à 16:28, Niranda Perera a écrit :
> > Hi all,
> >
> > This is RE: [1] & [2] String reverse kernel. Even though it is a seemingly
> > trivial
Le 17/05/2021 à 16:28, Niranda Perera a écrit :
Hi all,
This is RE: [1] & [2] String reverse kernel. Even though it is a seemingly
trivial exercise, I would like to clarify a few things.
In the current PR [1], there are 2 reverse kernels, ASCII and UTF8. I'd
like to get some feedbac
Hi all,
This is RE: [1] & [2] String reverse kernel. Even though it is a seemingly
trivial exercise, I would like to clarify a few things.
In the current PR [1], there are 2 reverse kernels, ASCII and UTF8. I'd
like to get some feedback for the following points.
1. For ASCII reve
10 matches
Mail list logo