Re: [fpc-pascal] Parse unicode scalar

Hairy Pixels via fpc-pascal Sun, 02 Jul 2023 18:29:43 -0700


> On Jul 2, 2023, at 11:16 PM, Jer Haan <jdehaan2...@gmail.com> wrote:
> 
> This table is copied from Wikipedia.<uencoding.pas>Hope it’s useful for you. 
> If you improve the code pls let me know.
>


This is perfect, thanks! Much more complicated than I thought.

I'm curious now, if you were going the other direction and parsing a string of 
different unicode characters with different code point sequence lengths how 
would you know which length it was? For example I started off know which 
unicode scalar to use by looking at a table but if I had to find the character 
is stream of text?

I think UTF8 can have 1-4 byte characters so you could encounter 1 byte 
character followed by 4 byte characters interleaved and there's no header or 
terminator for each character. How is this solved?

Regards,
Ryan Joseph

_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Parse unicode scalar

Reply via email to