On Sun, Apr 19, 2020 at 4:10 AM Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > On Sun, 19 Apr 2020 07:41:49 +0900 > Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > > On Sat, 18 Apr 2020 00:22:26 +0900 > > Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > > > > > > > > On Fri, 17 Apr 2020 10:24:42 +0900 > > > Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > > > > > > > > > > > On Fri, 17 Apr 2020 09:35:36 +0900 > > > > Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > > > > > > > > > > > > > > On Thu, 16 Apr 2020 16:00:29 -0700 > > > > > Paul Eggert <egg...@cs.ucla.edu> wrote: > > > > > > > > > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > > > > > > > > > > > > > I have had no idea to solve the problem yet. If we revert it, > > > > > > > bug#33357 > > > > > > > will come back. > > > > > > > > > > > > Yes, I'd rather not revert if we can help it. > > > > > > > > > > > > My own thought was to not analyze the regular expression if we > > > > > > discover that the input is empty. :-) > > > > > > > > > > Now, I have a idea, it is that we build indexes of epsilon nodes > > > > > including in follows before remove epsilon nodes. > > > > > > > > > > > > I wrote fix for the bug, but it will be slower then at grep 2.27 yet. > > > > > > It was improved previous patch. > > > > Sorry, correct patch is here. > > I made the previous patch even simpler. > > before: > > $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null > real 7.24 > user 7.14 > sys 0.09 > > after: > > $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null > real 0.62 > user 0.52 > sys 0.10
Thank you for this patch. I have rebased and made minor syntactic changes. I'll push it to gnulib soon, if not today, then by Monday. I am considering creating a test case in grep, but it feels too tight to be feasible: I would use a relative perf test, requiring that a passing test incur a perf cost of less than say 100x. Here's the beginnings of my attempt (note: this is just an outline -- obviously would not rely on having "time" in path or as a shell builtin): gen() { local n=$1 local i=1 while : ; do local pat=$(printf $i | sha1sum | cut -d' ' -f1) printf '%s\n' "$pat$pat(\$|$pat)" i=$(expr $i + 1) test $i = $n && break done } gen 4000 > pats-4000 head -400 pats-4000 > pats-400 # With fixed code, that a 10x input size increase (n=400 to 4000) # induces a 40x runtime increase: .05 -> 2.0s # Just prior to this change, it's 150x: 0.2 -> 30s env LC_ALL=C time -p src/grep -E -v -m1 -f pats-400 /dev/null env LC_ALL=C time -p src/grep -E -v -m1 -f pats-4000 /dev/null