On Mon, Jun 18, 2007 at 06:46:40PM +0300, Lars Wirzenius wrote: > On ma, 2007-06-18 at 13:37 +0100, Pierre Habouzit wrote: > > On Mon, Jun 18, 2007 at 10:48:04AM +0100, Pierre Habouzit wrote: > > > multi-byte one would be really really bad (as you would end up with e..g. > > > strings split in the middle of a point code, *brrr* you definitely don't > > > want that). > > > > I wasn't clear it seems, but what I mean is if a programs assumes he's > > dealing with ascii, > > This buggy assumption seems to happen in every locale, not just C.UTF-8, > and in every other case we treat it as a bug. Is there a standard that > says every C.* locale must have the same single byte character set as > the plain C locale? > > (Incidentally, the standard for the C language does not require the > character set in the C locale to be ASCII; EBCDIC, for example, works as > well. See 5.2.1, "Character sets", for the full description. You might > be able to find a copy of the standard by searching for ISO/IEC 9899.) >
I stand corrected, I read POSIX base, chapter 7: indeed, it specifies how the collation is done, but not how the characters are encoded, so a C.utf-8 does not seems like a so bad idea. Though I'm quite sure we can find software that assume that the character set in the C locale are always ASCII. But okay, maybe it's worth fixing those few. -- ·O· Pierre Habouzit ··O [EMAIL PROTECTED] OOO http://www.madism.org
pgpBYgGchlyPC.pgp
Description: PGP signature