Raul Miller: > Which implies that this mechanism isn't useful for representing different > languages in the same document. That, instead, it's logically equivalent > to a MIME declaration of the document's language.
I don't know where you got this impression, but it's wrong. Read the document. It introduces a TAG START character, Ascii-equivelent tag characters, and a TAG CANCEL character. <EN-US>You can label text like this.<DE-DE>Ja, du kanst.<TAG CANCEL> > Please explain why it matters to the reader whether the letter A is > classifed by the unicode consortium as mathematical [or not]? Because in theory, MATHEMATICAL ITALIC CAPITAL A won't be available on every keyboard, nor in every font. Any software that translates ordinary, non-mathematical italic characters to MATHEMATICAL ITALIC's would be non-conformant to the Unicode standard. They shouldn't obey case mappings, and HTML markup and the like probably won't and shouldn't work on them. There's no way most people will be able to enter them without setting up fairly unusual software. As a reader, you probably couldn't tell if my message was in KOI8-R and that I was using the Cyrllic lookalike characters whereever possible, but that doesn't make it more correct or more likely. > I disagree. The Han Unification issue is more like the difference > between the latin and the italic character sets. Yes, many characters > are similar, however there are also some characters which are unique to > each representaiton. Japenese can travel in China and use 'Japenese' ideographs to comunicate with the Chinese people who have no knowledge of Chinese. That's a indictive sign that the characters being used are fundamentally the same characters. Yes, there are characters that are written differently and unique characters - such is true about two languages that use the Latin script. I'm not arguing that all the unifications of individual characters were correct, but the fundamental concept of unification is correct. (It's interesting that it's almost always the Japenese that complain about the unificaition - the Koreans and Chinese, for the most part, seem to find the variations introduced by unification to be normal. One of the main forces behind unificiation was Chinese, with GB 13000) > And, this could be rectified -- with Unicode 3.1, they have the code > space to represent each major representation of the character set. Actually, it can't be rectified. The code space has existed for almost half a decade - the only change is that it's being used now. But part of the fundamental nature of Unicode is the unification of CJK characters. You can not change the meaning of 50,000 characters in the Unicode standard and invalidate all Japenese/Chinese/Korean (pick two) data in Unicode, any more than you can introduce case up and case down control characters into ASCII and use the space of lower case characters for something else. > However, Unicode is not a mature standard, so we need to be careful in > places where it would cause problems. What? It's not mature? The majority of the world's desktops use, or will soon use, Unicode, as it's fundamental to Mac OS X and Windows NT/2000/ME. It's been around for ten years now, and has reached the point where it's fundamentally stagnant. Sure, there will be a few more ideographs, a few more mathematical characters, a few more obscure/dead/minority scripts encoded but Unicode 3.1 is basically what Unicode 5.9 will be. The Unicode people are committed to not breaking backward compatibility, and with the wealth of support put by many of them into Unicode, they can't afford to change anything major. It may be wrong, but it's mature. > But that still leaves us with the "JIS has characters which aren't in > Unicode" issue. [If that's an actual issue.] All the characters from JIS X 0208 and JIS X 0212 are in Unicode (they were one of the original primary sources of characters for Unicode). JIS X 0208 is the character set used in ISO-2022-JP, and I believe SJIS and EUC-JP use the same set. JIS X 0213 should be completely included in Unicode, as the same Japanese body that does JIS X 0213 is the ISO 10646 liason. I know that a number of what Unicode would consider variants of preencoded characters were encoded in Unicode for compatibility with JIS X 0213. Radovan Garabik: > well, would you indicate just "this README needs japanese unicode font" > and the user has to figure out by himself what is that > or "this README needs -misc-fixed-*-*-*-ja-*-*-*-*-*-*-iso10646-1" > and the user is fubar when he does not have that font. When would this be necessary? The appropriate fixed font should get picked by locale (it's in xterm now; I don't know if the Debian unstable xterm has it, or if it will be in XFree 4.1 or 4.2). So the issue is only when a user is using an inappropriate choice of font (which we can't save a user from) or is reading a Chinese readme in a Japanese locale or vice versa. If this is unreadable, the knowledgable user would know to switch fonts. At worst, it's no worse than what we have now with having to change locales and fonts to read a Chinese readme in a Japenese locales. -- David Starner - [EMAIL PROTECTED], [EMAIL PROTECTED]