On Sat, Jul 4, 2015 at 8:40 PM, Jim Meyering <j...@meyering.net> wrote: > On Mon, Nov 24, 2014 at 5:15 PM, Norihiro Tanaka <nori...@kcn.ne.jp> wrote: ... > Thank you for that patch. > I have rebased it and made some small improvements: > I combined an if+do loop into a single for-loop and moved > some declarations "down". I constructed a reproducer that > does not require two large inputs to demonstrate the > performance improvement. > > I've also begun to reword the commit log, but am out of time > for this evening, so will post this here, for now. > > I am still trying to convince myself that this is a strict > improvement, i.e., that the O(N^2) strstr calls avoided > by this change served no purpose.
I have pushed that. I have also pushed a test (attached) for this performance fix in a separate commit. As with most performance-measuring tests, it may be fragile, especially when many tests are run in parallel, but it passes for me on a few different systems.
From d940cf045c4e777d37b19ee4639623f2c4a7e062 Mon Sep 17 00:00:00 2001 From: Jim Meyering <meyer...@fb.com> Date: Sun, 5 Jul 2015 11:43:47 -0700 Subject: [PATCH] tests: add a test for the performance fix * tests/long-pattern-perf: New file. * tests/Makefile.am (TESTS): Add it. --- tests/Makefile.am | 1 + tests/long-pattern-perf | 50 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) create mode 100755 tests/long-pattern-perf diff --git a/tests/Makefile.am b/tests/Makefile.am index 629d322..c95d5a9 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -80,6 +80,7 @@ TESTS = \ khadafy \ kwset-abuse \ long-line-vs-2GiB-read \ + long-pattern-perf \ match-lines \ max-count-overread \ max-count-vs-context \ diff --git a/tests/long-pattern-perf b/tests/long-pattern-perf new file mode 100755 index 0000000..cba6553 --- /dev/null +++ b/tests/long-pattern-perf @@ -0,0 +1,50 @@ +#!/bin/sh +# grep-2.21 would incur a 100x penalty for 10x increase in regexp length + +# Copyright 2015 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +require_timeout_ +require_hi_res_time_ + +fail=0 + +echo x > in || framework_failure_ +# We could use seq -s '' (avoiding the tr filter), but I +# suspect some version of seq does not honor that option. +# Note that we want 10x the byte count (not line count) in the larger file. +seq 5000 | tr -d '\012' > re || framework_failure_ +seq 40000 | tr -d '\012' > re-10x || framework_failure_ + +start=$(hi_res_time_) +grep -f re in; st=$? +stop=$(hi_res_time_) +test $st = 1 || fail=1 + +# Increasing the length of the regular expression by a factor +# of 10 should cause no more than a 10x increase in duration. +# However, we'll draw the line at 30x to avoid false-positives. +# Use an integer; some 'timeout' implementations have trouble with +# floating-point. +n_sec=$( + $AWK 'BEGIN { print 1 + int (30 * ('$stop' - '$start'))}' < /dev/null +) + +# Expect no match, i.e., exit status of 1. Anything else is an error. +timeout $n_sec grep -f re-10x in; st=$? +test $st = 1 || fail=1 + +Exit $fail -- 2.3.7