bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

Paul Eggert Wed, 19 Sep 2018 00:51:01 -0700

Jim Meyering wrote:

   seq 10000 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in
   time -p env LC_ALL=C grep -vf in in


   (before) real 63.43 user 61.67 system 1.65
   (after)  real  1.64 user  1.61 system 0.03

   # If we do not add '.' at last in pattern, not dfa but kwset is used.

grep also speeds-up about 3x in following case.

   time -p env LC_ALL=C grep -vf /usr/share/dict/linux.words 
/usr/share/dict/linux.words

   (before) real  2.48 user  2.09 system 0.38
   (after)  real  7.69 user  6.32 system 1.29

Thank you for the patches.
However, the before/after numbers you show here suggest that the
"after" code takes more than triple of the time of "before". >
Also, when I compared grep compiled at
123620af88f55c3e0cc9f0aed7311c72f625bc82 (latest, including your
changes) and that compiled at the prior commit,
9c11510507ebcd31671f10d9b88532f8e6657ad2, I find that the new version
takes over 30 seconds, while the prior one took about 20 seconds.

Is that last pair of times for the second benchmark he gave? I confess to beinglazy and not trying that benchmark, as I was on an Ubuntu system that didn'thave the linux.words file.

Did you try the first benchmark? On my Ubuntu 18.04 x86-64 (Xeon E3-1225 V2)platform, I got (before) real 55.51 user 51.53 sys 3.98, (after) real 0.64 user0.60 sys 0.04, so the change is a big performance win there.

On the other hand, I just now did the 2nd benchmark with a copy of a Fedora 28linux.words file, and got (before) real 8.06 user 6.20 sys 1.85, (after) real21.69 user 21.21 sys 0.47, so it's about three times slower. Ouch.

bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

Reply via email to