Jim Meyering <j...@meyering.net> wrote:
> dfa.c's match_mb_charset function *is* used, e.g., in a
> command like this one:
> 
>   printf '\0' |src/grep -aE '^\s?$'

Wow, just it isn't good.  I think that behavior of `fails' should be
same as of `trans' except `fails' checks accepted conditions, including
following part.  match_mb_charset() should be avoided as far as possible,
as it doesn't support collating symbols and equivalence classes.

>               /* Falling back to the glibc matcher in this case gives
>                  better performance (up to 25% better on [a-z], for
>                  example) and enables support for collating symbols and
>                  equivalence classes.  */
>               if (d->states[s].has_mbcset && backref)
>                 {
>                   *backref = 1;
>                   goto done;
>                 }

From 3d006b605337b2b2aa6f23557a1949f6f5de4613 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Sun, 19 Oct 2014 10:40:18 +0900
Subject: [PATCH] dfa: fall MBCSET back to the glibc matcher in transition at
 acceptable position

DFA doesn't support collating symbols and equivalence classes, so fall
MBCSET back to the glibc matcher in transition at acceptable position.
BTW at other positions, has already fallen it back.

* src/dfa.c (dfaexec_main): Do it.
---
 src/dfa.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/dfa.c b/src/dfa.c
index 58a4b83..de83689 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -3338,20 +3338,20 @@ dfaexec_main (struct dfa *d, char const *begin, char 
*end,
                   continue;
                 }
 
-              /* Falling back to the glibc matcher in this case gives
-                 better performance (up to 25% better on [a-z], for
-                 example) and enables support for collating symbols and
-                 equivalence classes.  */
-              if (d->states[s].has_mbcset && backref)
-                {
-                  *backref = 1;
-                  goto done;
-                }
-
               /* The following code is used twice.
                  Use a macro to avoid the risk that they diverge.  */
 #define State_transition()                                              \
   do {                                                                  \
+              /* Falling back to the glibc matcher in this case gives   \
+                 better performance (up to 25% better on [a-z], for     \
+                 example) and enables support for collating symbols and \
+                 equivalence classes.  */                               \
+              if (d->states[s].has_mbcset && backref)                   \
+                {                                                       \
+                  *backref = 1;                                         \
+                  goto done;                                            \
+                }                                                       \
+                                                                        \
               /* Can match with a multibyte character (and multi-character \
                  collating element).  Transition table might be updated.  */ \
               s = transit_state (d, s, &p, (unsigned char *) end);      \
-- 
2.1.1

Reply via email to