Hi, At Thu, 16 Nov 2000 09:40:26 +0000, Edmund GRIMLEY EVANS <[EMAIL PROTECTED]> wrote:
> > You are right... the i18n in Linux is not coming well, everybody seems to > > implement their own scheme... > > Besides, GNU having choosen a sizeof(wchar_t)==4 doesn't help to encourage > > using libc's locale support... =/ Consumption of memory is less important than whether I can use my daily encodings (EUC-JP, ISO-2022-JP, and so on) or canoot at all. I didn't think of developers who hesitate to use wchar_t because of its memory consumption. I cannot believe, since memory consumption is too trifling problem compared with the problem whether a user can use the software or not. I will agree with developers who dare to hard-code UTF-8 instead of wchar_t, if they abolish the support of 8bit (or 7bit) encoding by the softwares which they develop. I mean, if they need their (European- language speakers, in most cases) daily (i.e., 7 and 8bit) encodings (i.e., if they don't abolish the support of 7 or 8bit encodings), why do they choose not to support our daily encodings? > If you are suggesting that sizeof(wchar_t) could be 2, then please > explain what you think mbtowc(&wc, "\360\220\200\200", 4) should do in > a UTF-8 locale, and why you think that would be easier for We cannot assume anything on the concrete value of wchar_t variables. If a certain system uses the UCS-2 as an internal expression of wchar_t, that call of mbtowc() will fail. However, there can be a system whose sizeof(wchar_t) is 2 and whose internal expression of wchar_t is not UCS-2, which does not fail for such a mbtowc() call. # Ok, such a system is not likely to exist. I wanted to say that # UCS is not only candidate for internal expression of wchar_t. # For example, it is likely there is a system whose wchar_t is # Mule-like code, i.e., some bits for specifying a coded character # set and other bits for code point in the character set. FYI: "\360\220\200\200" in UTF-8 means u+10000. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://surfchem0.riken.go.jp/~kubota/