Andreas Schwab wrote: > Dave Korn writes: > >> I'll check. Joseph's suggestion sounds likely: I think Cygwin just >> switched >> to use lots of UTF-8 internally, so I might well need to specify an encoding >> as well. (Sorry for not being as well educated in this field as I really >> ought to be by now.) > > If cygwin wants to be POSIX compatible then the C locale cannot use > UTF-8.
I'm certainly no expert, but AFAICT POSIX requires nothing of the sort. locale != character encoding, as below. (I could be wrong, but I think you could easily have a POSIX-conformant C locale on a system which uses EBCDIC ecoding -- because the default locale definition tables are specified in terms of character, not hexadecimal, values.) Also, see the HTML table at http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03. "The tables in Locale Definition describe the characteristics and behavior of the POSIX locale for data consisting entirely of characters from the portable character set and the control character set. For other characters, the behavior is unspecified. For C-language programs, the POSIX locale shall be the default locale when the setlocale() function is not called." IOW, it only imposes requirements on how the POSIX locale operates on the basic 128 characters (*interpreted as characters*, with zero regard to their hexidecimal values. For ASCII and UTF-8...those characters are the "lower 128" 7bit hex values, and are the same; behavior with respect to "other characters" -- the "upper 128" for single byte, and any multibyte -- is explicitly "unspecified". So C.UTF-8 is a perfectly valid default POSIX locale. The underlying issue is actually gcc: its i18n messages appear explicitly to "translate" from (e.g.) _("error in file '%s'") to "error in file {fancy-left-quote}%s{fancy-right-quote}" when the encoding is UTF-8. Working around that by specifying setlocale("C") isn't sufficient, without also specifying the encoding... But not all systems will recognize "C.ASCII" as /THE/ C locale, with explicit ASCII encoding; they might not recognize "C.ASCII" at all. Looks like to me that this silence concerning encoding is a hole in the standard. -- Chuck