Le mercredi 26 juin 2024 à 13:46 +0200, Maxime Devos a écrit :
> > 
> > Maybe `the number of codepoints` will work here.
> > (string-length "👨‍🏭") ;; => 3
> > (string-length "é") ;; => 2> 
> > The number of characters here is 1 in both cases.
> 
> No, in Unicode (and Guile equates character=Unicode character) all
> characters correspond to a single codepoint.


Agreed. "The number of code points" would be correct, but "the number
of characters" (i.e., the current wording) is correct too. In the
Scheme terminology, a character is just a Unicode code point,
as can be seen from the name of the procedure character? and related
APIs.


> You need to fix your setup, that’s not what Guile does.


No; he wrote é, U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT,
which is two characters unlike é, LATIN SMALL LETTER E WITH ACUTE.

Likewise 👨‍🏭 is U+1F468 MAN + U+200D ZERO WIDTH JOINER + U+1F3ED FACTORY.

The "visual characters" are called grapheme clusters, and AFAIK Guile
doesn't provide any API that relates to grapheme clusters. (Note that
the number of grapheme clusters in a given strings depends on the Unicode
database and therefore on the Unicode version.)

There are programming languages where the data type called "character"
corresponds to grapheme clusters, but I don't think this is common.
Swift is the only example I know.

Obligatory reading: https://hsivonen.fi/string-length/


Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to