On 02/07/2023 19:20, Nikolay Nikolov via fpc-pascal wrote:
On 7/2/23 16:30, Hairy Pixels via fpc-pascal wrote:
I'm interested in parsing unicode scalars (I think they're called) to
byte sized values but I'm not sure where to start. First thing I did
was choose the unicode scalar U+1F496 (💖).
There's no such thing as "unicode scalar" in Unicode terminology:
https://unicode.org/glossary/
There seems to be
https://www.unicode.org/versions/Unicode10.0.0/ch03.pdf#G7404
Next I cheated and ask ChatGPT. :) Amazingly from my question it was
able to tell me the scaler is comprised of these 4 bytes:
 240 159 146 150
That is an utf-8 encoded representation of such a value.
You can find them on https://www.compart.com/en/unicode/U+0041
(using the hex for whatever codepoint interests you)
The question is, how was 1F496 decomposed into 4 bytes?
https://en.wikipedia.org/wiki/UTF-8#Encoding
_______________________________________________
fpc-pascal maillist - fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal