----- Original Message ----- From: Raul Miller <[EMAIL PROTECTED]> Subject: Re: Bug#99933: Comments on Unicode
> On Fri, Jul 06, 2001 at 04:36:25AM +0100, David Starner wrote: > > > Once unicode can act as a super set for every character set we currently > > > support, we can use it as such. Until then, we can't. > > > > If Unicode were a super set for every character set that anyone needs to > > support, it would be worthless and completely unusable. > > I didn't say for any character set that anyone needs to support. > I said for every character set we currently support. I hope you see the > difference. With my Debian hat on, of course I see the difference. With my Unicode hat on, there is no difference. Every small group and company has their own character sets that they need supported, and Debian's just another group. Note that Unix locales tend to prefentially use standardized character sets (JIS X 0218, ISO-8859-*) which ISO 10646 had to superset completely. If you have a recent version of locales installed, look in /usr/share/i18n/charmaps, which has every character set we support for use in iconv or locales. For actual locale charsets, look in /etc/locale.gen. If you remove ISO-8859-* (which are all Unicode compatible) and remove UTF-8, you're left with 11 charsets: cp1251, tis-620, koi8-r, koi8-u, euc-tw, euc-jp, gb2312, gb18030, gbk, big5, and big5hks. 3 of these have problems: euc-tw, big5 and big5hks. All three have characters that can't be reversably mapped to Unicode and back. euc-tw shouldn't be a problem, as its irreversable mappings are due to duplication of an entire CNS plane of characters, apparently due to an encoding quirk. big5 has some characters mapped to private use segments; I don't know if this is because glibc doesn't use Unicode 3.1 yet, or if that represents a private use segment in big5 (the characters are contigious), or if they haven't been encoded in Unicode yet. (Unlikely, IMO). -- David Starner - [EMAIL PROTECTED], [EMAIL PROTECTED]