On 04/29/2016 11:09 AM, Graeme Geldenhuys wrote:

No, because UTF-8 doesn't use surrogate pairs.
Really ?

I understand that "surrogate pairs" is combining a printable character (i.e on of the nearly 2^32 UTF thingies) with another of those to be combined to a different printable thingy (/e.g. "A" plus "add two dots above" to crate a "Ä").

Now a series of 32-bit UTF thingies can be compressed to as well a series of UTF8 encoded bytes or as a series of UTF16 encoded words. Both of which usually is much shorter (measured in bytes) than the uncompressed UTF32 information.

So the UTF8 vs UTF16 issue is a lower layer of encoding.

-Michael
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to