Re: [Lazarus] UTF16 2 utf8

Hans-Peter Diettrich Thu, 05 May 2011 06:46:31 -0700

José Mejuto schrieb:

I think that the text that says the UCS2 has been extended, does not
means that UCS2 has been extended, it says that UCS2 has been extended
to UTF-16, so UCS2 can not be considered Unicode anymore as noted in
ISO 10646:


UCS-2. UCS-2 stands for �Universal Character Set coded in 2 octets� and is also 
known as
�the two-octet BMP form.� It was documented in earlier editions of 10646 as the 
two-octet
(16-bit) encoding consisting only of code positions for plane zero, the Basic 
Multilingual
Plane. This documentation has been removed from ISO/IEC 10646:2011, and the term
UCS-2 should now be considered obsolete. It no longer refers to an encoding 
form in either
10646 or the Unicode Standard.

I agree that UCS-2 no longer represents the current Unicode range, butit still is a true UCS-4 subset (BMP).

The UCS standards define Unicode as ranges of values, while the UTFstandards define encodings.

The UTF-7/8 encodings are purely numerical compression schemes, whileUTF-16 (with surrogate pairs) more reflects a tree-like structure of"planes", "groups", "blocks", "codepages" etc., favored by the UnicodeConsortium. Such a view may be interesting to font writers, which canrestrict an font to part of the full Unicode range, but is of littlehelp with handling Unicode programmatically.


DoDi


--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Re: [Lazarus] UTF16 2 utf8

Reply via email to