- Original Message -
From: Roger So <[EMAIL PROTECTED]>
To: ;
Sent: Thursday, November 23, 2000 10:05 PM
Subject: Re: UTF-8 locales
> On Thu, Nov 23, 2000 at 04:44:06PM +0900, Tomohiro KUBOTA wrote:
> > (Mojibake is a Japanese word. How should I call this in Englis
On Thu, Nov 23, 2000 at 04:44:06PM +0900, Tomohiro KUBOTA wrote:
> (Mojibake is a Japanese word. How should I call this in English?)
They're called (literally) "monster characters" in Hong Kong. :p
--
Roger Sotelnet://e-fever.org
spacehunt at e-f
On Wed, Nov 22, 2000 at 10:56:02AM +0900, Tomohiro KUBOTA wrote:
> At Tue, 21 Nov 2000 11:46:41 +0700,
> Theppitak Karoonboonayanan <[EMAIL PROTECTED]> wrote:
>
> > > For conversion from number of characters to number of columns, you
> > > will need to use wcwidth() or wcswidth().
> >
> > I'm in
Hi,
At Wed, 22 Nov 2000 16:49:52 +0900,
NIIBE Yutaka <[EMAIL PROTECTED]> wrote:
> > However, the current woody system (with locale 2.1.97-1) has only
> > one UTF-8 locale of ko_KR.utf8. UTF-8 locales are needed for this
> > model to work well. Why only it?
>
> Because of "localedef". Older
Tomohiro KUBOTA writes:
> I found glibc 2.2 supports GB18030. It has
> /usr/share/i18n/charmaps/GB18030.gz as an definition of GB18030.
> (It is amazing that it covers not only ideograms but also various
> characters in the world, including Latin, Greek, Cyrillic, Hebrew,
> Arab, Thai, Armenian,
Hi,
At Mon, 20 Nov 2000 22:04:16 +1100,
Roger So <[EMAIL PROTECTED]> wrote:
> You might want to add HKSCS to that list :p
Ok, I added it (and GCCS) to my documentation 'Introduction to I18N'.
http://www.debian.org/doc/manuals/intro-i18n/
> [ regarding PRC Govt's ban of non-GB18030 compliant sof
Hi,
At Tue, 21 Nov 2000 11:46:41 +0700,
Theppitak Karoonboonayanan <[EMAIL PROTECTED]> wrote:
> > For conversion from number of characters to number of columns, you
> > will need to use wcwidth() or wcswidth().
>
> I'm interested in this. I used to work the th_TH locale for glibc, and
> I'd lik
Hi,
At Tue, 21 Nov 2000 11:46:41 +0700,
Theppitak Karoonboonayanan <[EMAIL PROTECTED]> wrote:
> > For conversion from number of characters to number of columns, you
> > will need to use wcwidth() or wcswidth().
>
> I'm interested in this. I used to work the th_TH locale for glibc, and
> I'd lik
On Tue, Nov 21, 2000 at 10:00:41AM +0900, Tomohiro KUBOTA wrote:
> For conversion from number of characters to number of columns, you
> will need to use wcwidth() or wcswidth().
I'm interested in this. I used to work the th_TH locale for glibc, and
I'd like to know how to describe this conversio
Hi,
At Mon, 20 Nov 2000 11:28:25 -0600,
David Starner <[EMAIL PROTECTED]> wrote:
> As for the reason I don't use wchat_t, not all the world works in C.
> Most other languages have roll-your-own support for multi-byte character
> sets or provide Unicode support.
This is true that languages other
On Thu, Nov 16, 2000 at 08:21:26PM +0900, Tomohiro KUBOTA wrote:
> I will agree with developers who dare to hard-code UTF-8 instead of
> wchar_t, if they abolish the support of 8bit (or 7bit) encoding by the
> softwares which they develop. I mean, if they need their (European-
> language speakers
On Mon, Nov 20, 2000 at 07:25:11PM +0900, Tomohiro KUBOTA wrote:
>
> BTW, I think GB18030 would be a _character set_, not _encoding_.
> If so, we won't have zh_CN.GB18030 locale.
In fact it is both, AFAICT; GB18030 defines the set of characters, and
the way to encode them. Just like GBK.
> Exam
Hi,
At Mon, 20 Nov 2000 01:11:02 -0700,
Anthony Fok <[EMAIL PROTECTED]> wrote:
> To add to that list, China has the new GB18030-2000 standard
> (locale zh_CN.GB18030) which also contains many characters beyond Unicode.
Interesting. I will have to mention it in my "Introduction to I18N"
document
On Mon, Nov 20, 2000 at 11:15:57AM +0900, Tomohiro KUBOTA wrote:
> > I thought this is because
> > the "living" languages are all restricted to 16bit? Hmm... i might be wrong.
>
> Taiwan CNS 11643 character set has about 47000 ideograms.
> Recently, Japan came to have a new standard JIS X 0213. T
Hi,
At Sun, 19 Nov 2000 22:50:54 +0100,
Bernd Eckenfels <[EMAIL PROTECTED]> wrote:
> Afaik UTF8 is not able to encode 32bit unicode?
Strictly speaking, there is no 32bit unicode. UCS-4 character set
has 31bit code space, not 32bit. UTF-8 can encode the whole UCS-4.
> I thought this is becaus
On Sun, Nov 19, 2000 at 10:50:54PM +0100, Bernd Eckenfels wrote:
> On Sat, Nov 18, 2000 at 08:01:11PM -0600, David Starner wrote:
> > Which includes the Chinese and Japenese, who need the characters found
> > in the Supplementary Ideographic Planes, which means 4 byte characters.
>
> Afaik UTF8 is
Bernd Eckenfels writes:
> Afaik UTF8 is not able to encode 32bit unicode? I thought this is because
> the "living" languages are all restricted to 16bit? Hmm... i might be wrong.
> Does that mean Java does not support asian languages with its 16bit Unicode?
UTF-8 can be used encode UCS-4.
> As I
On Sat, Nov 18, 2000 at 08:01:11PM -0600, David Starner wrote:
> Which includes the Chinese and Japenese, who need the characters found
> in the Supplementary Ideographic Planes, which means 4 byte characters.
Afaik UTF8 is not able to encode 32bit unicode? I thought this is because
the "living" l
On Sat, Nov 18, 2000 at 10:55:58PM -0300, Nicolás Lichtmaier wrote:
> > > You are right... the i18n in Linux is not coming well, everybody seems to
> > > implement their own scheme...
> > > Besides, GNU having choosen a sizeof(wchar_t)==4 doesn't help to
> > > encourage
> > > using libc's locale
> > > Ok, there may be few _users_ who use UTF-8 locale. However, a
> > > certain amount of _developers_ are interested in UTF-8 support _now_.
> > > If UTF-8 locale is not available, I think that they tend to hard-code
> > > UTF-8 encoding instead of using LOCALE. IMHO, the hard-code of UTF-8
Hi,
At Thu, 16 Nov 2000 09:40:26 +,
Edmund GRIMLEY EVANS <[EMAIL PROTECTED]> wrote:
> > You are right... the i18n in Linux is not coming well, everybody seems to
> > implement their own scheme...
> > Besides, GNU having choosen a sizeof(wchar_t)==4 doesn't help to encourage
> > using libc's
Nicolás Lichtmaier <[EMAIL PROTECTED]>:
> > Ok, there may be few _users_ who use UTF-8 locale. However, a
> > certain amount of _developers_ are interested in UTF-8 support _now_.
> > If UTF-8 locale is not available, I think that they tend to hard-code
> > UTF-8 encoding instead of using LOCAL
> Ok, there may be few _users_ who use UTF-8 locale. However, a
> certain amount of _developers_ are interested in UTF-8 support _now_.
> If UTF-8 locale is not available, I think that they tend to hard-code
> UTF-8 encoding instead of using LOCALE. IMHO, the hard-code of UTF-8
> is evil becaus
Hi,
At Tue, 14 Nov 2000 03:56:07 +0900,
GOTO Masanori <[EMAIL PROTECTED]> wrote:
> I guess that original glibc documents write a "currently supported
> and somewhat tested locales", and Debian's glibc also follows it
> (see glibc's source code localedata/SUPPORTED).
> Only some UTF-8 locale are s
From: Tomohiro KUBOTA <[EMAIL PROTECTED]>
Date: Mon, 13 Nov 2000 21:19:37 +0900
> However, the current woody system (with locale 2.1.97-1) has only
> one UTF-8 locale of ko_KR.utf8. UTF-8 locales are needed for this
> model to work well. Why only it?
I guess that original glibc documents write a
On Mon, Nov 13, 2000 at 09:19:37PM +0900, Tomohiro KUBOTA wrote:
> However, the current woody system (with locale 2.1.97-1) has only
> one UTF-8 locale of ko_KR.utf8. UTF-8 locales are needed for this
> model to work well. Why only it?
With the locale 2.2 packages, you can add, say, en_US.UTF-8
On Nov 13, Tomohiro KUBOTA <[EMAIL PROTECTED]> wrote:
>However, the current woody system (with locale 2.1.97-1) has only
>one UTF-8 locale of ko_KR.utf8. UTF-8 locales are needed for this
>model to work well. Why only it?
Maybe because most europeans do not use Unicode nor are going to use it
27 matches
Mail list logo