Re: UTF-8 locales

2000-11-23 Thread Andrew Cunningham
- Original Message - From: Roger So <[EMAIL PROTECTED]> To: ; Sent: Thursday, November 23, 2000 10:05 PM Subject: Re: UTF-8 locales > On Thu, Nov 23, 2000 at 04:44:06PM +0900, Tomohiro KUBOTA wrote: > > (Mojibake is a Japanese word. How should I call this in Englis

Re: UTF-8 locales

2000-11-23 Thread Roger So
On Thu, Nov 23, 2000 at 04:44:06PM +0900, Tomohiro KUBOTA wrote: > (Mojibake is a Japanese word. How should I call this in English?) They're called (literally) "monster characters" in Hong Kong. :p -- Roger Sotelnet://e-fever.org spacehunt at e-f

Re: UTF-8 locales

2000-11-23 Thread Theppitak Karoonboonayanan
On Wed, Nov 22, 2000 at 10:56:02AM +0900, Tomohiro KUBOTA wrote: > At Tue, 21 Nov 2000 11:46:41 +0700, > Theppitak Karoonboonayanan <[EMAIL PROTECTED]> wrote: > > > > For conversion from number of characters to number of columns, you > > > will need to use wcwidth() or wcswidth(). > > > > I'm in

Re: UTF-8 locales

2000-11-23 Thread Tomohiro KUBOTA
Hi, At Wed, 22 Nov 2000 16:49:52 +0900, NIIBE Yutaka <[EMAIL PROTECTED]> wrote: > > However, the current woody system (with locale 2.1.97-1) has only > > one UTF-8 locale of ko_KR.utf8. UTF-8 locales are needed for this > > model to work well. Why only it? > > Because of "localedef". Older

Re: UTF-8 locales

2000-11-21 Thread Tom Emerson
Tomohiro KUBOTA writes: > I found glibc 2.2 supports GB18030. It has > /usr/share/i18n/charmaps/GB18030.gz as an definition of GB18030. > (It is amazing that it covers not only ideograms but also various > characters in the world, including Latin, Greek, Cyrillic, Hebrew, > Arab, Thai, Armenian,

Re: UTF-8 locales

2000-11-21 Thread Tomohiro KUBOTA
Hi, At Mon, 20 Nov 2000 22:04:16 +1100, Roger So <[EMAIL PROTECTED]> wrote: > You might want to add HKSCS to that list :p Ok, I added it (and GCCS) to my documentation 'Introduction to I18N'. http://www.debian.org/doc/manuals/intro-i18n/ > [ regarding PRC Govt's ban of non-GB18030 compliant sof

Re: UTF-8 locales

2000-11-21 Thread Tomohiro KUBOTA
Hi, At Tue, 21 Nov 2000 11:46:41 +0700, Theppitak Karoonboonayanan <[EMAIL PROTECTED]> wrote: > > For conversion from number of characters to number of columns, you > > will need to use wcwidth() or wcswidth(). > > I'm interested in this. I used to work the th_TH locale for glibc, and > I'd lik

Re: UTF-8 locales

2000-11-21 Thread Tomohiro KUBOTA
Hi, At Tue, 21 Nov 2000 11:46:41 +0700, Theppitak Karoonboonayanan <[EMAIL PROTECTED]> wrote: > > For conversion from number of characters to number of columns, you > > will need to use wcwidth() or wcswidth(). > > I'm interested in this. I used to work the th_TH locale for glibc, and > I'd lik

Re: UTF-8 locales

2000-11-20 Thread Theppitak Karoonboonayanan
On Tue, Nov 21, 2000 at 10:00:41AM +0900, Tomohiro KUBOTA wrote: > For conversion from number of characters to number of columns, you > will need to use wcwidth() or wcswidth(). I'm interested in this. I used to work the th_TH locale for glibc, and I'd like to know how to describe this conversio

Re: UTF-8 locales

2000-11-20 Thread Tomohiro KUBOTA
Hi, At Mon, 20 Nov 2000 11:28:25 -0600, David Starner <[EMAIL PROTECTED]> wrote: > As for the reason I don't use wchat_t, not all the world works in C. > Most other languages have roll-your-own support for multi-byte character > sets or provide Unicode support. This is true that languages other

Re: UTF-8 locales

2000-11-20 Thread David Starner
On Thu, Nov 16, 2000 at 08:21:26PM +0900, Tomohiro KUBOTA wrote: > I will agree with developers who dare to hard-code UTF-8 instead of > wchar_t, if they abolish the support of 8bit (or 7bit) encoding by the > softwares which they develop. I mean, if they need their (European- > language speakers

Re: UTF-8 locales

2000-11-20 Thread Roger So
On Mon, Nov 20, 2000 at 07:25:11PM +0900, Tomohiro KUBOTA wrote: > > BTW, I think GB18030 would be a _character set_, not _encoding_. > If so, we won't have zh_CN.GB18030 locale. In fact it is both, AFAICT; GB18030 defines the set of characters, and the way to encode them. Just like GBK. > Exam

Re: UTF-8 locales

2000-11-20 Thread Tomohiro KUBOTA
Hi, At Mon, 20 Nov 2000 01:11:02 -0700, Anthony Fok <[EMAIL PROTECTED]> wrote: > To add to that list, China has the new GB18030-2000 standard > (locale zh_CN.GB18030) which also contains many characters beyond Unicode. Interesting. I will have to mention it in my "Introduction to I18N" document

Re: UTF-8 locales

2000-11-20 Thread Anthony Fok
On Mon, Nov 20, 2000 at 11:15:57AM +0900, Tomohiro KUBOTA wrote: > > I thought this is because > > the "living" languages are all restricted to 16bit? Hmm... i might be wrong. > > Taiwan CNS 11643 character set has about 47000 ideograms. > Recently, Japan came to have a new standard JIS X 0213. T

Re: UTF-8 locales

2000-11-19 Thread Tomohiro KUBOTA
Hi, At Sun, 19 Nov 2000 22:50:54 +0100, Bernd Eckenfels <[EMAIL PROTECTED]> wrote: > Afaik UTF8 is not able to encode 32bit unicode? Strictly speaking, there is no 32bit unicode. UCS-4 character set has 31bit code space, not 32bit. UTF-8 can encode the whole UCS-4. > I thought this is becaus

Re: UTF-8 locales

2000-11-19 Thread David Starner
On Sun, Nov 19, 2000 at 10:50:54PM +0100, Bernd Eckenfels wrote: > On Sat, Nov 18, 2000 at 08:01:11PM -0600, David Starner wrote: > > Which includes the Chinese and Japenese, who need the characters found > > in the Supplementary Ideographic Planes, which means 4 byte characters. > > Afaik UTF8 is

Re: UTF-8 locales

2000-11-19 Thread Tom Emerson
Bernd Eckenfels writes: > Afaik UTF8 is not able to encode 32bit unicode? I thought this is because > the "living" languages are all restricted to 16bit? Hmm... i might be wrong. > Does that mean Java does not support asian languages with its 16bit Unicode? UTF-8 can be used encode UCS-4. > As I

Re: UTF-8 locales

2000-11-19 Thread Bernd Eckenfels
On Sat, Nov 18, 2000 at 08:01:11PM -0600, David Starner wrote: > Which includes the Chinese and Japenese, who need the characters found > in the Supplementary Ideographic Planes, which means 4 byte characters. Afaik UTF8 is not able to encode 32bit unicode? I thought this is because the "living" l

Re: UTF-8 locales

2000-11-18 Thread David Starner
On Sat, Nov 18, 2000 at 10:55:58PM -0300, Nicolás Lichtmaier wrote: > > > You are right... the i18n in Linux is not coming well, everybody seems to > > > implement their own scheme... > > > Besides, GNU having choosen a sizeof(wchar_t)==4 doesn't help to > > > encourage > > > using libc's locale

Re: UTF-8 locales

2000-11-18 Thread Nicolás Lichtmaier
> > > Ok, there may be few _users_ who use UTF-8 locale. However, a > > > certain amount of _developers_ are interested in UTF-8 support _now_. > > > If UTF-8 locale is not available, I think that they tend to hard-code > > > UTF-8 encoding instead of using LOCALE. IMHO, the hard-code of UTF-8

Re: UTF-8 locales

2000-11-16 Thread Tomohiro KUBOTA
Hi, At Thu, 16 Nov 2000 09:40:26 +, Edmund GRIMLEY EVANS <[EMAIL PROTECTED]> wrote: > > You are right... the i18n in Linux is not coming well, everybody seems to > > implement their own scheme... > > Besides, GNU having choosen a sizeof(wchar_t)==4 doesn't help to encourage > > using libc's

Re: UTF-8 locales

2000-11-16 Thread Edmund GRIMLEY EVANS
Nicolás Lichtmaier <[EMAIL PROTECTED]>: > > Ok, there may be few _users_ who use UTF-8 locale. However, a > > certain amount of _developers_ are interested in UTF-8 support _now_. > > If UTF-8 locale is not available, I think that they tend to hard-code > > UTF-8 encoding instead of using LOCAL

Re: UTF-8 locales

2000-11-15 Thread Nicolás Lichtmaier
> Ok, there may be few _users_ who use UTF-8 locale. However, a > certain amount of _developers_ are interested in UTF-8 support _now_. > If UTF-8 locale is not available, I think that they tend to hard-code > UTF-8 encoding instead of using LOCALE. IMHO, the hard-code of UTF-8 > is evil becaus

Re: UTF-8 locales

2000-11-14 Thread Tomohiro KUBOTA
Hi, At Tue, 14 Nov 2000 03:56:07 +0900, GOTO Masanori <[EMAIL PROTECTED]> wrote: > I guess that original glibc documents write a "currently supported > and somewhat tested locales", and Debian's glibc also follows it > (see glibc's source code localedata/SUPPORTED). > Only some UTF-8 locale are s

Re: UTF-8 locales

2000-11-13 Thread GOTO Masanori
From: Tomohiro KUBOTA <[EMAIL PROTECTED]> Date: Mon, 13 Nov 2000 21:19:37 +0900 > However, the current woody system (with locale 2.1.97-1) has only > one UTF-8 locale of ko_KR.utf8. UTF-8 locales are needed for this > model to work well. Why only it? I guess that original glibc documents write a

Re: UTF-8 locales

2000-11-13 Thread David Starner
On Mon, Nov 13, 2000 at 09:19:37PM +0900, Tomohiro KUBOTA wrote: > However, the current woody system (with locale 2.1.97-1) has only > one UTF-8 locale of ko_KR.utf8. UTF-8 locales are needed for this > model to work well. Why only it? With the locale 2.2 packages, you can add, say, en_US.UTF-8

Re: UTF-8 locales

2000-11-13 Thread Marco d'Itri
On Nov 13, Tomohiro KUBOTA <[EMAIL PROTECTED]> wrote: >However, the current woody system (with locale 2.1.97-1) has only >one UTF-8 locale of ko_KR.utf8. UTF-8 locales are needed for this >model to work well. Why only it? Maybe because most europeans do not use Unicode nor are going to use it