On Sat, 18 Jul 2015 22:15:33 -0700
Jim Meyering <j...@meyering.net> wrote:

> Hello,
> Thank you for the patches in this report:
> 
>   http://bugs.gnu.org/19306
> 
> Please excuse my delay in getting back to you on this.
> Would you revise each of those to include a test case
> that demonstrates the problem/fix?

Thanks for your reviewing of this report.

This is not bug fix.  It avoids that BACKREF is found in the process of
DFAEXEC and passed to regex in multibyte locale.  In other words, if a
pattern includes BACKREF, grep does not try to use DFA from the
beginning.

I confirmed about 10% speed-up for a test case in attachment.

  Before patching: real 7.29  user 7.26  sys 0.02
  After patching : real 6.57  user 6.55  sys 0.01

KWset and DFA superset succeeds for all rows in the test case, and DFA
for multibyte succeeds, too.  However, all rows are rejected in regex.

After patching, grep does not try DFA for multibyte, as pattern includes
BACKREF.

In addtion, I believe that DFA is simplified by removal of handling for
BACKREF from dfaanalyze(), dfassbuild() and dfaexec().
#!/bin/sh

make_input () {
  echo "$1" | tr AB '\244\263'
}

yes `make_input BABABABABABABABABABABABABABABABABABABABAABABABABABABABABABBA` |
  head -1000000 >in
make_input '\(AB\)ABABABABABABABAB\1' >pat

env LC_ALL=ja_JP.eucJP time -p src/grep -f pat in

Reply via email to