-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Andy Koppe on 12/29/2009 6:30 AM: >> Remember, POSIX states that any use in a character context of bytes with >> the 8th-bit set is specifically undefined in the C locale (whether that be >> C.ASCII or C.UTF-8). > > I very much disagree with that. C.ASCII and C.UTF-8 are different > locales from plain "C", and the whole point of the explicitly stated > charset is to define the meaning of bytes beyond 7-bit ASCII.
Point taken: an explicit "C.UTF-8" is a request of a specific charset along with C semantics (such as no translation of output messages, posix-mandated formatting for time and money, ...), but because the charset is explicit, the use of 8-bit bytes is well-defined in our implementation (and since POSIX does not specify C.UTF-8, you've already left the realm of portability and gone into implementation-defined). But my point remains: an explicit "C" is specified to be charset-agnostic, so a portable program requesting "C" should not be expecting any particular behavior of 8-bit bytes in character contexts. Programs that use LC_ALL=C to try to get 8-bit transparency from character contexts are flat-out non-portable. They get other well-defined benefits on 8-bit bytes (such as sorting by strcmp instead of strcoll, fixed-format messages, ...), but only insofar as those 8-bit bytes are in byte contexts rather than character contexts. - -- Don't work too hard, make some time for fun as well! Eric Blake e...@byu.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAks6Bw0ACgkQ84KuGfSFAYByhQCZAWbgggdJm5KBtBfNm9ElHmJN p14AoMoKgy2XxhNqnV/KxuFVyttbp+m6 =eLYn -----END PGP SIGNATURE----- -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple