Jim Meyering wrote:
seq 10000 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in
time -p env LC_ALL=C grep -vf in in
(before) real 63.43 user 61.67 system 1.65
(after) real 1.64 user 1.61 system 0.03
# If we do not add '.' at last in pattern, not dfa but kwset is used.
grep also speeds-up about 3x in following case.
time -p env LC_ALL=C grep -vf /usr/share/dict/linux.words
/usr/share/dict/linux.words
(before) real 2.48 user 2.09 system 0.38
(after) real 7.69 user 6.32 system 1.29
Thank you for the patches.
However, the before/after numbers you show here suggest that the
"after" code takes more than triple of the time of "before". >
Also, when I compared grep compiled at
123620af88f55c3e0cc9f0aed7311c72f625bc82 (latest, including your
changes) and that compiled at the prior commit,
9c11510507ebcd31671f10d9b88532f8e6657ad2, I find that the new version
takes over 30 seconds, while the prior one took about 20 seconds.
Is that last pair of times for the second benchmark he gave? I confess to being
lazy and not trying that benchmark, as I was on an Ubuntu system that didn't
have the linux.words file.
Did you try the first benchmark? On my Ubuntu 18.04 x86-64 (Xeon E3-1225 V2)
platform, I got (before) real 55.51 user 51.53 sys 3.98, (after) real 0.64 user
0.60 sys 0.04, so the change is a big performance win there.
On the other hand, I just now did the 2nd benchmark with a copy of a Fedora 28
linux.words file, and got (before) real 8.06 user 6.20 sys 1.85, (after) real
21.69 user 21.21 sys 0.47, so it's about three times slower. Ouch.