On Apr 22, 2004, at 9:01 AM, Dan Sugalski wrote:
At 8:51 AM -0700 4/22/04, Jeff Clites wrote:On Apr 22, 2004, at 8:31 AM, Dan Sugalski wrote:
At 6:03 PM -0600 4/21/04, kj wrote:
The URL above goes to a useful table for working with johab. I do know it is a legacy charset, but I don't know how much it is still used. Technically, ASCII is legacy, too. :)
Ah, at this point Unicode's legacy too. Besides, as long as RAD-50 lives, nobody's got much standing to call a character set "Legacy" :)
Unicode is an actively evolving standard. It's far from legacy.
That evolution is what does it--every deployed version of Unicode is legacy, as there's always something to supplant it. Which arguably makes things worse in some cases--I'm waiting for us to run into problems when we start handing Unicode 4.0-compatible text off to system services expecting 3.x or 2.x code. Made worse in some ways because almost nobody'll notice, since most everyone we have doing stuff can get by with what the 2.0 standard provides.
Take a look at the following two pages for information on how the Unicode standard deals with change. It's exceedingly conservative, and designed specifically so that the sorts of problems you seem to be worrying about, in fact do not exist. The point of revisions is mainly to add new characters, and of course a system based on an older revision of the standard will not know about these characters, but since day 1 systems have needed to deal gracefully with unassigned code points. It's a non-problem.
http://www.unicode.org/faq/cope_change.html http://www.unicode.org/standard/stability_policy.html
Unicode has been carefully designed with this sort of stability to change (or, backwards-compatibility, if you will) in mind.
JEff