On Sun, 11 Oct 2015 21:34:05 -0700 Paul Eggert <egg...@cs.ucla.edu> wrote:
> greg boyd wrote: > > test case (single line) > > abchelloabc > > > > grep does not find the line with grep -e '^hello' nor with grep -e 'hello$' > > however, the line is output with > > grep -e '^hello' -e 'hello$' > > Oooo, that's a good one. Give your student extra credit! As it happens, the > bug was recently fixed by this patch by Norihiro Tanaka: > > http://git.savannah.gnu.org/cgit/grep.git/commit/?id=256a4b494fe1c48083ba73b4f62607234e4fefd5 > > and the fix should appear in the next grep release. However, since the patch > was supposed to affect only performance, it appears that the bug fix was due > to luck, and I'm taking the liberty of adding your student's test case by > installing the attached further patch, to help prevent this bug from coming > back in a future version. I found above patch is also buggy. It is never fix. It returns shorter `must' than expected. e.g. `must' for pattern `.hello' is `hello', but returns `hell' by this bug. Next, `must' for pattern `^hello' is `hello' but returns `hell'. It will cause slite performance down, and disappear bug#21670, BTW, I guess the bug does not change external behavior. First patch fixes the bug. After the patch is applied, bug#21670 appears again. And, second patch fixes bug#21670. When pattern has ^ and/or $, if begline and/or endline flag of mp is turned off, EXACT should be false.
From d4f86e62c854f323feccea76889d2298a5f335d4 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Tue, 13 Oct 2015 11:43:49 +0900 Subject: [PATCH 1/2] dfa: don't use DFA for exact matching If a pattern constraint beginning of line a pattern, DFA is used after matched in KWset, even when it is exact. The behavior is not expected. Now, whenever a pattern is exact, DFA is not used. * src/dfa.c (dfamust): Don't use DFA for exact matching. --- src/dfa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/dfa.c b/src/dfa.c index ac5129b..5b9a4fe 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -4135,7 +4135,7 @@ dfamust (struct dfa const *d) = case_fold && MB_CUR_MAX == 1 ? toupper (t) : t; } mp->is[i] = mp->left[i] = mp->right[i] = '\0'; - mp->in = enlist (mp->in, mp->is, i - 1); + mp->in = enlist (mp->in, mp->is, i); break; } } -- 2.4.1
From 3be0bc551387911faf98d89dc076f39561a753f3 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Tue, 13 Oct 2015 01:19:43 +0900 Subject: [PATCH 2/2] dfa: fix bug in alternate of sub-patterns different in only the constraints A line may incorrectly matches alternate of sub-patterns different in only the constraints e.g. ^a|a$ in extended regular expression. This change fixes the bug. Reported by Greg Boyd in http://debbugs.gnu.org/21670 * src/dfa.c (dfamust): For a pattern with constraints, check that it is matched including the constraints, to judge whether it is exact. --- src/dfa.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/dfa.c b/src/dfa.c index 5b9a4fe..cdea4e5 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -3940,6 +3940,8 @@ dfamust (struct dfa const *d) bool exact = false; bool begline = false; bool endline = false; + bool need_begline = false; + bool need_endline = false; for (size_t ri = 0; ri < d->tindex; ++ri) { @@ -3949,10 +3951,12 @@ dfamust (struct dfa const *d) case BEGLINE: mp = allocmust (mp, 2); mp->begline = true; + need_begline = true; break; case ENDLINE: mp = allocmust (mp, 2); mp->endline = true; + need_endline = true; break; case LPAREN: case RPAREN: @@ -4029,7 +4033,9 @@ dfamust (struct dfa const *d) result = mp->in[i]; if (STREQ (result, mp->is)) { - exact = true; + if ((!need_begline || mp->begline) && (!need_endline + || mp->endline)) + exact = true; begline = mp->begline; endline = mp->endline; } -- 2.4.1