Re: gawk regex stuff you may want

Aharon Robbins Sat, 23 Jan 2016 20:02:52 -0800

Hi Paul.

> As far as 'grep' is concerned, it'll trust what regcomp does here, so we 
> do have some freedom to change the code in this area. However, it looks 
> to me like your patch would do the wrong thing for unibyte locales where 
> btowc (b) returns a value that neither b nor WEOF. Also, the rest the 
> code assumes that if btowc returns WEOF in a multibyte locale then there 
> won't be a match (see the setup code in init_dfa, and I have the nagging 
> feeling that this assumption is embedded elsewhere). So, how about the 
> attached more-conservative patch instead?


I applied that patch and gawk passes its tests. I will probably
keep it.  See one comment, below.

> Again, it'd be helpful to know what the problem actually was.

I don't have detailed enough records to be able to tell when all these
small changes were added and why. I will keep them, since the hassle of
removing them, finding out which systems want them, and putting them
back is more than I care to deal with.

I may, one day, just drop in GNULIB's versions.  But not yet.

> diff --git a/ChangeLog b/ChangeLog
> index 181f709..a870e86 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,11 @@
> +2016-01-21  Paul Eggert  <egg...@cs.ucla.edu>
> +
> +     regex: treat [x] as x if x is a unibyte encoding error
> +     Problem reported by Aharon Robbins in:
> +     http://lists.gnu.org/archive/html/bug-gnulib/2016-01/msg00091.html
> +     * lib/regcomp.c (parse_byte) [_LIBC && RE_ENABLE_I18N]: New function.
> +     (build_range_exp) [_LIBC && RE_ENABLE_I18N]: Use it.

I think you mean ! _LIBC && RE_ENABLE_I18N.

Thanks,

Arnold

Re: gawk regex stuff you may want

Reply via email to