Hi Paul. > As far as 'grep' is concerned, it'll trust what regcomp does here, so we > do have some freedom to change the code in this area. However, it looks > to me like your patch would do the wrong thing for unibyte locales where > btowc (b) returns a value that neither b nor WEOF. Also, the rest the > code assumes that if btowc returns WEOF in a multibyte locale then there > won't be a match (see the setup code in init_dfa, and I have the nagging > feeling that this assumption is embedded elsewhere). So, how about the > attached more-conservative patch instead?
I applied that patch and gawk passes its tests. I will probably keep it. See one comment, below. > Again, it'd be helpful to know what the problem actually was. I don't have detailed enough records to be able to tell when all these small changes were added and why. I will keep them, since the hassle of removing them, finding out which systems want them, and putting them back is more than I care to deal with. I may, one day, just drop in GNULIB's versions. But not yet. > diff --git a/ChangeLog b/ChangeLog > index 181f709..a870e86 100644 > --- a/ChangeLog > +++ b/ChangeLog > @@ -1,3 +1,11 @@ > +2016-01-21 Paul Eggert <egg...@cs.ucla.edu> > + > + regex: treat [x] as x if x is a unibyte encoding error > + Problem reported by Aharon Robbins in: > + http://lists.gnu.org/archive/html/bug-gnulib/2016-01/msg00091.html > + * lib/regcomp.c (parse_byte) [_LIBC && RE_ENABLE_I18N]: New function. > + (build_range_exp) [_LIBC && RE_ENABLE_I18N]: Use it. I think you mean ! _LIBC && RE_ENABLE_I18N. Thanks, Arnold