2009/12/28 Andy Koppe: > 2009/12/28 Rodrigo Medina: >> Hi, >> I am moving from cygwin-1.5 and gcc3.4 to cygwin1.7 and gcc4. >> Some simple programs of mine fail. >> >> I am using LC_ALL=es_VE.ISO-8859-15. >> >> I have reduced the problem to this example >> >> -------------- >> #include <stdio.h> >> main() >> { >> static char* line1 = >> " This letter has an accent -->á, this one has no accent -->a\n\n"; >> static char* line2 = " ***** another line ******\n\n"; >> static char* line3 = >> " These letters have an accent -->á, these ones have no accent -->A!\n\n"; >> static char* line4 = >> " This letter has an accent -->Ã, this one has no accent -->A\n\n"; >> printf(" This letter has an accent -->á, this one has no accent >> -->a\n\n"); >> printf(line2); >> printf("%d %d %d\n\n",line1[29],line1[30],line1[31]); >> printf(line1); >> printf(line2); >> printf(" These letters have an accent -->á, these ones have no accent >> -->A!\n\n"); >> printf(line2); >> printf("%d %d %d %d\n\n",line3[32],line3[33],line3[34],line3[35]); >> printf(line3); >> printf(line2); >> printf(" This letter has an accent -->Ã, this one has no accent >> -->A\n\n"); >> printf(line2); >> printf("%d %d %d\n\n",line4[29],line4[30],line4[31]); >> printf(line4); >> printf(line2); >> printf(" ----- END ------"); >> }---------------- >> >> My output is: >> >> This letter has an accent -->á, this one has no accent -->a >> >> ***** another line ****** >> >> 62 -31 44 >> >> This letter has an accent --> ***** another line ****** >> >> These letters have an accent -->á, these ones have no accent -->A! >> >> ***** another line ****** >> >> 62 -61 -95 44 >> >> These letters have an accent -->á, these ones have no accent -->A! >> >> ***** another line ****** >> >> This letter has an accent -->Ã, this one has no accent -->A >> >> ***** another line ****** >> >> 62 -61 44 >> >> This letter has an accent --> ***** another line ****** >> >> ----- END ------ >> >> As you can see the output of printf(string_constant) is what >> I expected. The ouput of printf(char_array) is trucated at the non-ASCII >> character. > > Reproduced. Looking at the compiler's assembly output, some of the > printf() calls are replaced by calls to puts(), and those do work > correctly, whereas the remaining printf() calls with accented > characters misbehave. So printf()'s handling of non-ASCII characters > needs a closer look.
Ah, the problem actually is that your program is missing a call to setlocale(LC_CTYPE, "") to switch to the locale and character set specified in the environment. In fact, since your program contains hard-coded ISO-8859-15 strings, you should probably do setlocale(LC_CTYPE, "<whatever>.ISO-8859-15"). Without a setlocale call, programs use the "C" locale, and on Cygwin 1.7 that implies the UTF-8 character set. Those single accented ISO-8859-15 characters are invalid when interpreted as UTF-8, so printf halts there. The accented character pairs like "á", meanwhile, happen to be valid UTF-8, so they get through. I couldn't find specific text about invalid bytes in the POSIX printf spec, but it does say the following: "The format is a character string, beginning and ending in its initial shift state, if any. The format is composed of zero or more directives: ordinary characters, which are simply copied to the output stream, and conversion specifications, each of which shall result in the fetching of zero or more arguments." It's talking about "characters" rather than "bytes" there, which I think does leave the behaviour for invalid bytes undefined, so newlib's printf implementation is in its rights to just stop processing the string at one of those. Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple