Andreas Schwab wrote:
> Dave Korn writes:
> 
>>   I'll check.  Joseph's suggestion sounds likely: I think Cygwin just 
>> switched
>> to use lots of UTF-8 internally, so I might well need to specify an encoding
>> as well.  (Sorry for not being as well educated in this field as I really
>> ought to be by now.)
> 
> If cygwin wants to be POSIX compatible then the C locale cannot use
> UTF-8.

I'm certainly no expert, but AFAICT POSIX requires nothing of the sort.
locale != character encoding, as below. (I could be wrong, but I think
you could easily have a POSIX-conformant C locale on a system which uses
EBCDIC ecoding -- because the default locale definition tables are
specified in terms of character, not hexadecimal, values.)


Also, see the HTML table at
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03.


"The tables in Locale Definition describe the characteristics and
behavior of the POSIX locale for data consisting entirely of characters
from the portable character set and the control character set. For other
characters, the behavior is unspecified. For C-language programs, the
POSIX locale shall be the default locale when the setlocale() function
is not called."

IOW, it only imposes requirements on how the POSIX locale operates on
the basic 128 characters (*interpreted as characters*, with zero regard
to their hexidecimal values.  For ASCII and UTF-8...those characters are
the "lower 128" 7bit hex values, and are the same; behavior with respect
to "other characters" -- the "upper 128" for single byte, and any
multibyte -- is explicitly "unspecified".  So C.UTF-8 is a perfectly
valid default POSIX locale.

The underlying issue is actually gcc: its i18n messages appear
explicitly to "translate" from (e.g.) _("error in file '%s'") to "error
in file {fancy-left-quote}%s{fancy-right-quote}"  when the encoding is
UTF-8.  Working around that by specifying setlocale("C") isn't
sufficient, without also specifying the encoding...

But not all systems will recognize "C.ASCII" as /THE/ C locale, with
explicit ASCII encoding; they might not recognize "C.ASCII" at all.
Looks like to me that this silence concerning encoding is a hole in the
standard.

--
Chuck

Reply via email to