Hi Ryan, I’ve created attached unit, which takes a code point and returns the utf8 char as a string. It’s based on the Wikipedia article on UTF8 UTF-8 encodes code points in one to four bytes, depending on the value of the code point. The x characters are replaced by the bits of the code point: |
This table is copied from Wikipedia. |
uencoding.pas
Description: Binary data
Hope it’s useful for you. If you improve the code pls let me know. Best regards, Jeroen On 2 Jul 2023, at 15:30, Hairy Pixels via fpc-pascal <fpc-pascal@lists.freepascal.org> wrote: I'm interested in parsing unicode scalars (I think they're called) to byte sized values but I'm not sure where to start. First thing I did was choose the unicode scalar U+1F496 (💖). Next I cheated and ask ChatGPT. :) Amazingly from my question it was able to tell me the scaler is comprised of these 4 bytes: 240 159 146 150 I was able to correctly concatenate these characters and writeln printed the correct character. var s: String; begin s := char(240)+char(159)+char(146)+char(150); writeln(s); end. The question is, how was 1F496 decomposed into 4 bytes? Regards, Ryan Joseph _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal |
_______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal