On Tue, Jun 05, 2001 at 04:44:46PM -0700, Russ Allbery wrote:
> NeonEdge <[EMAIL PROTECTED]> writes:
>
> > This is evident in the "Musical Symbols" and even "Byzantine Musical
> > Symbols". Are these character sets more important than the actual
> > language character sets being denied to the other countries? Are musical
> > and mathematical symbols even a language at all?
>
> At the same time as 246 Byzantine Musical Symbols and 219 Musical Symbols
> were added, 43,253 Asian language ideographs were added. I fail to see
> the problem.
>
> Musical and mathematical symbols are certainly used more frequently than
> ancient Han ideographs that have been obsolete for 2,000 years, and it's
> not like the ideographs are having major difficulties being added to
> Unicode either.
>
> If the author of the original paper referred to here thinks there are
> still significant characters missing from Unicode, he should stop whining
> about it and put together a researched proposal. That's what the
> Byzantine music researchers did, and as a result their characters have now
> been added. This is how standardization works. You have to actually go
> do the work; you can't just complain and expect someone else to do it for
> you.
(as a lurker in the unicode list ([EMAIL PROTECTED]), which also had
the link to the opinion under discussion posted in there)
Exactly.
As another data point, once in a while in the list someone asks what
about Egyptian hieroglyphics, Unicode can't be all-encompassing,
nyahnyahnyah? Well, there the situation is that there *is* slowly
ongoing work between the egyptologists and the Unicode people to get
all the stork-atop-a-hippo-facing-left encoded, it's just that the
egyptologists themselves have hard time agreeing what actually would
be the canonical set of glyphs. There is a process for getting more
characters into Unicode, but the Unicode people cannot be experts in
all possible scripts. No proposals, no encodings.
Another constant source of confusion (which is at least part of
the Asian discontent) is that Unicode encodes abstract characters,
not any particular rendering (fonts). (There are some exceptions
to this, but they are mainly there to guarantee a safe round-trip
to Unicode and back for legacy characters.) For example, bold-a
is the same as italic-a is the same as plain-a. The same principle
was behind the "Han unification". Sometimes it would be preferable
to decompose characters to be more flexible and future-proof
For example the number of codepoints for Han could be dramatically
reduced if there were an agreed-upon way to electronically decompose
the glyphs to radicals-- but it seems (I am not an expert on this,
mind) that there isn't, and we have to deal with dozens of thousands
of them.
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen