Le mercredi 26 juin 2024 à 13:46 +0200, Maxime Devos a écrit : > > > > Maybe `the number of codepoints` will work here. > > (string-length "👨🏭") ;; => 3 > > (string-length "é") ;; => 2> > > The number of characters here is 1 in both cases. > > No, in Unicode (and Guile equates character=Unicode character) all > characters correspond to a single codepoint.
Agreed. "The number of code points" would be correct, but "the number of characters" (i.e., the current wording) is correct too. In the Scheme terminology, a character is just a Unicode code point, as can be seen from the name of the procedure character? and related APIs. > You need to fix your setup, that’s not what Guile does. No; he wrote é, U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT, which is two characters unlike é, LATIN SMALL LETTER E WITH ACUTE. Likewise 👨🏭 is U+1F468 MAN + U+200D ZERO WIDTH JOINER + U+1F3ED FACTORY. The "visual characters" are called grapheme clusters, and AFAIK Guile doesn't provide any API that relates to grapheme clusters. (Note that the number of grapheme clusters in a given strings depends on the Unicode database and therefore on the Unicode version.) There are programming languages where the data type called "character" corresponds to grapheme clusters, but I don't think this is common. Swift is the only example I know. Obligatory reading: https://hsivonen.fi/string-length/
signature.asc
Description: This is a digitally signed message part