On Sun, 11 Oct 2015 21:34:05 -0700
Paul Eggert <egg...@cs.ucla.edu> wrote:

> greg boyd wrote:
> > test case (single line)
> > abchelloabc
> >
> > grep does not find the line with grep -e '^hello'  nor with grep -e 'hello$'
> > however, the line is output with
> > grep -e '^hello' -e 'hello$'
> 
> Oooo, that's a good one.  Give your student extra credit!  As it happens, the 
> bug was recently fixed by this patch by Norihiro Tanaka:
> 
> http://git.savannah.gnu.org/cgit/grep.git/commit/?id=256a4b494fe1c48083ba73b4f62607234e4fefd5
> 
> and the fix should appear in the next grep release.  However, since the patch 
> was supposed to affect only performance, it appears that the bug fix was due 
> to luck, and I'm taking the liberty of adding your student's test case by 
> installing the attached further patch, to help prevent this bug from coming 
> back in a future version.

I found above patch is also buggy.  It is never fix.  It returns shorter
`must' than expected.  e.g. `must' for pattern `.hello' is `hello', but
returns `hell' by this bug.  Next, `must' for pattern `^hello' is `hello'
but returns `hell'.  It will cause slite performance down, and disappear
bug#21670,  BTW, I guess the bug does not change external behavior.

First patch fixes the bug.  After the patch is applied, bug#21670
appears again.  And, second patch fixes bug#21670.

When pattern has ^ and/or $, if begline and/or endline flag of mp is
turned off, EXACT should be false.
From d4f86e62c854f323feccea76889d2298a5f335d4 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Tue, 13 Oct 2015 11:43:49 +0900
Subject: [PATCH 1/2] dfa: don't use DFA for exact matching

If a pattern constraint beginning of line a pattern, DFA is used after
matched in KWset, even when it is exact.  The behavior is not expected.
Now, whenever a pattern is exact, DFA is not used.

* src/dfa.c (dfamust): Don't use DFA for exact matching.
---
 src/dfa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/dfa.c b/src/dfa.c
index ac5129b..5b9a4fe 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -4135,7 +4135,7 @@ dfamust (struct dfa const *d)
                 = case_fold && MB_CUR_MAX == 1 ? toupper (t) : t;
             }
           mp->is[i] = mp->left[i] = mp->right[i] = '\0';
-          mp->in = enlist (mp->in, mp->is, i - 1);
+          mp->in = enlist (mp->in, mp->is, i);
           break;
         }
     }
-- 
2.4.1

From 3be0bc551387911faf98d89dc076f39561a753f3 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Tue, 13 Oct 2015 01:19:43 +0900
Subject: [PATCH 2/2] dfa: fix bug in alternate of sub-patterns different in
 only the constraints

A line may incorrectly matches alternate of sub-patterns different in
only the constraints e.g. ^a|a$ in extended regular expression.  This
change fixes the bug.  Reported by Greg Boyd in
http://debbugs.gnu.org/21670

* src/dfa.c (dfamust): For a pattern with constraints, check that it is
matched including the constraints, to judge whether it is exact.
---
 src/dfa.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/dfa.c b/src/dfa.c
index 5b9a4fe..cdea4e5 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -3940,6 +3940,8 @@ dfamust (struct dfa const *d)
   bool exact = false;
   bool begline = false;
   bool endline = false;
+  bool need_begline = false;
+  bool need_endline = false;
 
   for (size_t ri = 0; ri < d->tindex; ++ri)
     {
@@ -3949,10 +3951,12 @@ dfamust (struct dfa const *d)
         case BEGLINE:
           mp = allocmust (mp, 2);
           mp->begline = true;
+          need_begline = true;
           break;
         case ENDLINE:
           mp = allocmust (mp, 2);
           mp->endline = true;
+          need_endline = true;
           break;
         case LPAREN:
         case RPAREN:
@@ -4029,7 +4033,9 @@ dfamust (struct dfa const *d)
               result = mp->in[i];
           if (STREQ (result, mp->is))
             {
-              exact = true;
+              if ((!need_begline || mp->begline) && (!need_endline
+                                                     || mp->endline))
+                exact = true;
               begline = mp->begline;
               endline = mp->endline;
             }
-- 
2.4.1

Reply via email to