When grep matcher uses DFA, I assumed that it might match multiline with
DFA superset, but it is wrong.  \n cannot occur inside a multibyte
character.  So an input always matches single line only with DFA superset.

Now set ALLOW_NL to 0, and remove loop.  It speeds up 2x in special case.

$ yes "$(printf 'a\nb')" | head -100000000 >k

(before)
$ time -p src/grep '\(a\|x\).\(b\|x\)' k
real 1.39
user 1.05
sys 0.31

(after)
$ time -p src/grep '\(a\|x\).\(b\|x\)' k
real 0.58
user 0.49
sys 0.09
From 2cb081032303f1bee0fec0be1f2482bcbe427e18 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Sun, 2 Nov 2014 14:39:43 +0900
Subject: [PATCH] grep: always match single line only with DFA superset

\n cannot occur inside a multibyte character.  So an input always
matches single line only with DFA superset.

* src/dfasearch.c (EGexecute): Simplify it with above.
---
 src/dfasearch.c |   19 +++++++++----------
 1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/src/dfasearch.c b/src/dfasearch.c
index 8052ef0..fbf0fac 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -283,23 +283,22 @@ EGexecute (char *buf, size_t size, size_t *match_size,
               /* Keep using the superset while it reports multiline
                  potential matches; this is more likely to be fast
                  than falling back to KWset would be.  */
-              while ((next_beg = dfaexec (superset, dfa_beg, (char *) end, 1,
-                                          &count, NULL))
-                     && next_beg != end
-                     && count != 0)
+              next_beg = dfaexec (superset, dfa_beg, (char *) end, 0,
+                                  &count, NULL);
+              if (next_beg == NULL || next_beg == end)
+                continue;
+
+              /* Narrow down to the line we've found.  */
+              if (count != 0)
                 {
-                  /* Try to match in just one line.  */
-                  count = 0;
                   beg = memrchr (buf, eol, next_beg - buf);
                   beg++;
                   dfa_beg = beg;
                 }
-              if (next_beg == NULL || next_beg == end)
-                continue;
-
-              /* Narrow down to the line we've found.  */
               end = memchr (next_beg, eol, buflim - next_beg);
               end = end ? end + 1 : buflim;
+
+              count = 0;
             }
 
           /* Try matching with DFA.  */
-- 
1.7.1

Reply via email to