sed and gawk use fastmap in regex, but grep does not.  By using fastmap,
I expect that grep speeds up for patterns as regex is used.

before:
$ time -p env LC_ALL=ja_JP.eucjp src/grep '\([a-b]\)\1' k
real 7.83
user 7.62
sys 0.07

after:
$ time -p env LC_ALL=ja_JP.eucjp src/grep '\([a-b]\)\1' k
real 0.46
user 0.38
sys 0.07

However, if grep uses fastmap, fails in case-fold-titlecase test.  It
means that grep's behavior differ from sed and gawk, as they use fastmap,
although it seems to be a bug in regex.
From 1337006597a7d7e14993af14e57d47d6b483fb0d Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Sun, 17 Jul 2016 01:25:18 +0900
Subject: [PATCH] grep: use fastmap in regex

* src/dfasearch.c (GEAcompile): Use fastmap in regex.
---
 src/dfasearch.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/src/dfasearch.c b/src/dfasearch.c
index 8052ef0..e5223e5 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -154,6 +154,9 @@ GEAcompile (char const *pattern, size_t size, reg_syntax_t 
syntax_bits)
       patterns = xnrealloc (patterns, pcount + 1, sizeof *patterns);
       patterns[pcount] = patterns0;
 
+      patterns[pcount].regexbuf.fastmap =
+        = xmalloc ((UCHAR_MAX + 1) * sizeof (char));
+
       char const *err = re_compile_pattern (p, len,
                                             &(patterns[pcount].regexbuf));
       if (err)
-- 
1.7.1

Reply via email to