"Peter S. Housel" wrote:
> > o Complete disdain for ISO-10646 being 32 bits, when 16
> > of them are never anything but 0, and were put there just
> > so that people could grep -v other people's languages out
> > of documents
> >
> > o I'll believe Hieroglyphics and Linear B when I see the
> > fonts and the programs that use them.  Dead languages
> > pretty much justify purpose-built linguistics software
> > anyway.
> 
> If you were a MathML user, or had a Chinese name using an obscure character,
> you would probably feel differently.

Why?  Have the Chinese sent representatives to an international
standards body to get code pages other than 0 filled in with
these characters?  Have the MathML users?

Basically, it's not necessary to have bits to represent these
code points until they are parts of a standard character set.
The entire point of Unicode was to provide round-trip capability
between character sets.

For MathML, you can actually unify the code points with Zapf or
other characters thatdon't exist simultaneously in any character
sets.  Alrternately, you could use a "private use" area.


> > o A desire for raw storage of Unicode, rather than UTF-8 or
> > UTF-7 encoding.  This last one is:
> 
> You still need at least 21 bits to have "raw storage of Unicode".  With
> anything less, either UTF-16 surrogates or UTF-8 multi-byte encodings have
> to be used.  With a 16-bit wchar_t, even if I personally don't have any text
> that uses characters beyond the BMP, I still have to write my code to
> account for surrogates.

Unicode 3.2.0 is not an ISO/IEC standard.  It's a political thing.

You might have an argument for ISO-10646-2:2001; however "Klingon"
is not a script I'm really worried about.  8-).


> > o People might accept doubling data size for the benefit
> > of internationalization.  They aren't going to accept
> > a random multiplier between 1 and 5.
> 
> I suspect UTF-16 doesn't compress very well using standard tools, and it is
> subject to byte-order difficulties.  (That goes double for UTF-32, of
> course.)  wchar_t probably shouldn't be directly used for storage.

Anything larger than a byte has byte order problems; that was one
of the original rationales for UTF-8 encoding.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Reply via email to