Jim Meyering <j...@meyering.net> wrote: > dfa.c's match_mb_charset function *is* used, e.g., in a > command like this one: > > printf '\0' |src/grep -aE '^\s?$'
Wow, just it isn't good. I think that behavior of `fails' should be same as of `trans' except `fails' checks accepted conditions, including following part. match_mb_charset() should be avoided as far as possible, as it doesn't support collating symbols and equivalence classes. > /* Falling back to the glibc matcher in this case gives > better performance (up to 25% better on [a-z], for > example) and enables support for collating symbols and > equivalence classes. */ > if (d->states[s].has_mbcset && backref) > { > *backref = 1; > goto done; > }
From 3d006b605337b2b2aa6f23557a1949f6f5de4613 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Sun, 19 Oct 2014 10:40:18 +0900 Subject: [PATCH] dfa: fall MBCSET back to the glibc matcher in transition at acceptable position DFA doesn't support collating symbols and equivalence classes, so fall MBCSET back to the glibc matcher in transition at acceptable position. BTW at other positions, has already fallen it back. * src/dfa.c (dfaexec_main): Do it. --- src/dfa.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/src/dfa.c b/src/dfa.c index 58a4b83..de83689 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -3338,20 +3338,20 @@ dfaexec_main (struct dfa *d, char const *begin, char *end, continue; } - /* Falling back to the glibc matcher in this case gives - better performance (up to 25% better on [a-z], for - example) and enables support for collating symbols and - equivalence classes. */ - if (d->states[s].has_mbcset && backref) - { - *backref = 1; - goto done; - } - /* The following code is used twice. Use a macro to avoid the risk that they diverge. */ #define State_transition() \ do { \ + /* Falling back to the glibc matcher in this case gives \ + better performance (up to 25% better on [a-z], for \ + example) and enables support for collating symbols and \ + equivalence classes. */ \ + if (d->states[s].has_mbcset && backref) \ + { \ + *backref = 1; \ + goto done; \ + } \ + \ /* Can match with a multibyte character (and multi-character \ collating element). Transition table might be updated. */ \ s = transit_state (d, s, &p, (unsigned char *) end); \ -- 2.1.1