José Mejuto schrieb:

I think that the text that says the UCS2 has been extended, does not
means that UCS2 has been extended, it says that UCS2 has been extended
to UTF-16, so UCS2 can not be considered Unicode anymore as noted in
ISO 10646:

UCS-2. UCS-2 stands for �Universal Character Set coded in 2 octets� and is also 
known as
�the two-octet BMP form.� It was documented in earlier editions of 10646 as the 
two-octet
(16-bit) encoding consisting only of code positions for plane zero, the Basic 
Multilingual
Plane. This documentation has been removed from ISO/IEC 10646:2011, and the term
UCS-2 should now be considered obsolete. It no longer refers to an encoding 
form in either
10646 or the Unicode Standard.

I agree that UCS-2 no longer represents the current Unicode range, but it still is a true UCS-4 subset (BMP).

The UCS standards define Unicode as ranges of values, while the UTF standards define encodings.

The UTF-7/8 encodings are purely numerical compression schemes, while UTF-16 (with surrogate pairs) more reflects a tree-like structure of "planes", "groups", "blocks", "codepages" etc., favored by the Unicode Consortium. Such a view may be interesting to font writers, which can restrict an font to part of the full Unicode range, but is of little help with handling Unicode programmatically.

DoDi


--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Reply via email to