| Hi Ryan, I’ve created attached unit, which takes a code point and returns the utf8 char as a string. It’s based on the Wikipedia article on UTF8 UTF-8 encodes code points in one to four bytes, depending on the value of the code point. The x characters are replaced by the bits of the code point: |
This table is copied from Wikipedia. |
uencoding.pas
Description: Binary data
Hope it’s useful for you. If you improve the code pls let me know. Best regards, Jeroen On 2 Jul 2023, at 15:30, Hairy Pixels via fpc-pascal <[email protected]> wrote: I'm interested in parsing unicode scalars (I think they're called) to byte sized values but I'm not sure where to start. First thing I did was choose the unicode scalar U+1F496 (💖). Next I cheated and ask ChatGPT. :) Amazingly from my question it was able to tell me the scaler is comprised of these 4 bytes: 240 159 146 150 I was able to correctly concatenate these characters and writeln printed the correct character. var s: String; begin s := char(240)+char(159)+char(146)+char(150); writeln(s); end. The question is, how was 1F496 decomposed into 4 bytes? Regards, Ryan Joseph _______________________________________________ fpc-pascal maillist - [email protected] https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal |
_______________________________________________ fpc-pascal maillist - [email protected] https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
