Hi All. > Date: Thu, 09 Jun 2011 10:14:01 -0700 > From: Paul Eggert <egg...@cs.ucla.edu> > To: Paolo Bonzini <bonz...@gnu.org> > CC: Aharon Robbins <arn...@skeeve.com>, bug-grep <bug-g...@gnu.org>, > bug-gnulib <bug-gnulib@gnu.org>, k...@freefriends.org > Subject: Re: Dealing with character ranges in grep > > On 06/08/2011 10:14 PM, Aharon Robbins wrote: > > > So, for the upcoming gawk 4.0, I decided (as Karl put it) to cut the > > Gordian knot and make ranges behave like the C locale, the way it's long > > been documented, and as most people expect. Those who want the POSIX > > behavior can still get it using --posix. > > This comment and the ensuing thread seems to be assuming old POSIX. > In new POSIX, that is, in POSIX 1003.1-2008, the standardization committee > removed the old, bogus requirement of using collating element order. > The new rule is that the regular expression [a-z] has an unspecified > behavior outside the C (or POSIX) locale. So the new gawk behavior > will conform to POSIX, even without the --posix option. > > I suggest that gawk's behavior for [a-z] be the same regardless of whether > --posix is specified, and that this behavior be what users expect > (namely, the ASCII character range). This will be simpler.
This is now done and pushed. I had to rearrange a chunk of the documentation, too. :-) With respect to the other issues raised, I think I will only express the facts / my opinions as they relate to gawk, and leave everything else alone. 1. Gawk's default is --with-included-regex. Gawk's regex is based on GLIBC's, but with fixes I've accrued over the years. Since I want gawk to work correctly everywhere, the default is to use the regex routines that I supply. 2. With respect to both equivalence classes and collating elements, I have to wonder if they are used much in practice. I do not recall even a single email or bug report about the fact that gawk does not support either of these. 3. If I understand the conversation, the gist is that RE_RANGES_IGNORE_LOCALES is not needed, since the latest standard allows us to just fix the code to use Rational Range Interpretation. In principle, I'm all for this, but in practice, I'm going to leave gawk's code alone for now (there's always 4.1 :-). I do think it's worth taking this up with Uli, but that can be pursued separately. In the worst case, RE_RANGES_IGNORE_LOCALES might be an acceptable addition if he (or the other maintainers) don't want to move off the current way of doing things. 4. If I can help get grep and sed to move to RRI, I'd like to do so. (I have preliminary patches for both.) But I'm not going to hold up the gawk release for those other programs. Thanks again to everyone, Arnold