On Sat, Oct 31, 2020 at 9:17 AM Gonzalo Padrino <grimalg.on+...@gmail.com> wrote: > While using GNU grep v3.4 in an Ubuntu 20.04 userspace running on top of > Win10 WSL (yeah, i know... but also checked in other envs) i discovered > what seems like an obvious bug (if i'm not mistaken). > The bug: > ----- > me@host:~$ echo 'xxxxy' |grep -E '^x+x+x+x+y$' > xxxxy > me@host:~$ echo 'xxxy' |grep -E '^x+x+x+x+y$' > xxxy > me@host:~$ echo 'xxy' |grep -E '^x+x+x+x+y$' > xxy > me@host:~$ echo 'xy' |grep -E '^x+x+x+x+y$' > > ---- > ...the terminal supports ansi color escapes, and what's really weird is > that only the result from the first command is colored in red. First and > fourth commands yield correct results; the second and third do not, as they > should not match it's input. > > I've tested releases from v3.1 to latest v3.5 and found the anomalous > behaviour in version v3.2 through v3.5. A (quick and clunky) git bisect led > me to believe it was introduced about two years ago, possibly in commit > 123620af88f55c3e0cc9f0aed7311c72f625bc82 ( > https://git.savannah.gnu.org/cgit/grep.git/commit/?id=123620af88f55c3e0cc9f0aed7311c72f625bc82). > If this is true, it would mean either the bug is in gnulib, or maybe grep > needed to do some kind of extra handling on it's side.
Thank you for reporting that. I confirm this is a bug in the very latest. This mistakenly matches: $ echo xxy |grep -E '^x+x+x+y$' xxy That regular expression requires that any match have at least three leading 'x's. This is indeed due to a bug in gnulib's lib/dfa.c. So far, I've found that we can band-aid fix it by disabling part of merge_nfa_state's optimizations with this patch, but I do not propose to make this change. This is just to show where the problem lies. I'm pretty sure we can retain and correct the optimization. diff --git a/lib/dfa.c b/lib/dfa.c index 74aafa2ee..087c266c5 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -2459,7 +2459,7 @@ merge_nfa_state (struct dfa *d, idx_t tindex, char *flags, continue; if (flags[sindex] & OPT_REPEAT) - delete (sindex, &follows[sindex]); + continue; merge2 (&follows[dindex], &follows[sindex], merged);