On 06/08/2011 10:14 PM, Aharon Robbins wrote: > So, for the upcoming gawk 4.0, I decided (as Karl put it) to cut the > Gordian knot and make ranges behave like the C locale, the way it's long > been documented, and as most people expect. Those who want the POSIX > behavior can still get it using --posix.
This comment and the ensuing thread seems to be assuming old POSIX. In new POSIX, that is, in POSIX 1003.1-2008, the standardization committee removed the old, bogus requirement of using collating element order. The new rule is that the regular expression [a-z] has an unspecified behavior outside the C (or POSIX) locale. So the new gawk behavior will conform to POSIX, even without the --posix option. I suggest that gawk's behavior for [a-z] be the same regardless of whether --posix is specified, and that this behavior be what users expect (namely, the ASCII character range). This will be simpler. Similarly for grep, glibc, etc. For the POSIX 1003.1-2008 rule, see rule 7 of: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05 and for the reasoning behind the rule change, see: http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05