On Sat, 18 Jul 2015 22:15:33 -0700 Jim Meyering <j...@meyering.net> wrote:
> Hello, > Thank you for the patches in this report: > > http://bugs.gnu.org/19306 > > Please excuse my delay in getting back to you on this. > Would you revise each of those to include a test case > that demonstrates the problem/fix? Thanks for your reviewing of this report. This is not bug fix. It avoids that BACKREF is found in the process of DFAEXEC and passed to regex in multibyte locale. In other words, if a pattern includes BACKREF, grep does not try to use DFA from the beginning. I confirmed about 10% speed-up for a test case in attachment. Before patching: real 7.29 user 7.26 sys 0.02 After patching : real 6.57 user 6.55 sys 0.01 KWset and DFA superset succeeds for all rows in the test case, and DFA for multibyte succeeds, too. However, all rows are rejected in regex. After patching, grep does not try DFA for multibyte, as pattern includes BACKREF. In addtion, I believe that DFA is simplified by removal of handling for BACKREF from dfaanalyze(), dfassbuild() and dfaexec().
#!/bin/sh make_input () { echo "$1" | tr AB '\244\263' } yes `make_input BABABABABABABABABABABABABABABABABABABABAABABABABABABABABABBA` | head -1000000 >in make_input '\(AB\)ABABABABABABABAB\1' >pat env LC_ALL=ja_JP.eucJP time -p src/grep -f pat in