Hello Michael, On 2016-04-29 at 11:23 you wrote: > > No, because UTF-8 doesn't use surrogate pairs. > Really ?
Yes. > those to be combined to a different printable thingy (/e.g. "A" plus > "add two dots above" to crate a "Ä"). No, that is something totally different and not what I was talking about. You are refering to combining diacritics. Two or more code-points (think "characters") combined to make a new looking single character on screen or printed. > Both of which usually is much shorter (measured in bytes) than the > uncompressed UTF32 information. Without you using the correct terminology, I think you are refering to composed and decomposed formats of a character. For example: e (U+0065) + ̈ (U+0308) = ë (2 code-points used) vs e (U+0065) + ̈ (U+0308) --> ë (1 code-point used) The first example above results in the decomposed version of ë. The second example above results in the composed version of ë. The decomposed versions are the prefered and recommended way by the Unicode Consortium. They (the Unicode Consortium) only included the composed versions for backward compatibility with existing character sets - when the Unicode standard was established. No new composed code-points will be added to the Unicode standard. Anyway, I was refering to surrogate pairs (applies to UTF-16 only), not composed/decomposed glyphs. Regards, Graeme _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal