> -----Original Message----- > From: Gabriel Paubert [mailto:[EMAIL PROTECTED] > Sent: Thursday, January 25, 2007 5:43 AM > To: Paolo Bonzini > Cc: Meissner, Michael; [EMAIL PROTECTED]; gcc@gcc.gnu.org > Subject: Re: [OT] char should be signed by default > > On Thu, Jan 25, 2007 at 10:29:29AM +0100, Paolo Bonzini wrote: > > > > >>A given program is written in one or the other of these two dialects. > > >>The program stands a chance to work on most any machine if it is > > >>compiled with the proper dialect. It is unlikely to work at all if > > >>compiled with the wrong dialect. > > > > > >It depends on the program, and whether or not chars in the user's > > >character set is sign extended (ie, in the USA, you likely won't notice > > >a difference between the two if chars just hold character values). > > > > You might notice if a -1 (EOF) becomes a 255 and you get an infinite > > loop in return (it did bite me). Of course, this is a bug in that > > outside the US a 255 character might become an EOF. > > That'a a common bug with getchar() and similar function because people > put the result into a char before testing it, like: > > char c; > while ((c=getchar())!=EOF) { > ... > } > > while the specification of getchar is that it returns an unsigned char > cast to an int or EOF, and therefore this code is incorrect independently > of whether char is signed or not: > - infinite loop when char is unsigned > - incomplete processing of a file because of early detection of EOF > when char is signed and you hit a 0xFF char.
Yep. This was discussed in the ANSI X3J11 committee in the 80's, and it is a problem (and the program is broken because getchar does return the one out of band return value). Another logical problem that occurs is if you are on a system where char and int are the same size, that there is no out of band Value that can be returned, and in theory the only correct way is to use feof and ferror, which few people do. > I've been bitten by both (although the second one is less frequent now > since 0xff is invalid in UTF-8). > > BTW, I'm of the very strong opinion that char should have been unsigned > by default because the name itself implies that it is used as a > enumeration of symbols, specialized to represent text. When you step > from one enum value to the following one (staying within the range of > valid values), you don't expect the new value to become lower than the > preceding one. And then there is EBCDIC, where there are 10 characters between 'I' and 'J' if memory serves. Plus the usual problem in ASCII that the national characters that are alphabetic aren't grouped with the A-Z, a-z characters. > Things would be very different if it had been called "byte" or > "short short int" instead. > > Gabriel > -- Michael Meissner AMD, MS 83-29 90 Central Street Boxborough, MA 01719