Mel Flynn escribió:
Hi,
attached a little test script for grep's -i performance. I tried a few
different machines and the 64-bit 7.2 machine I could steal doesn't seem to be
affected and out performs pcregrep.
Note, that pcregrep isn't POSIX regex so it's not a good base of
comparison. PCRE provides a POSIX-compliant interface to deal with
Perl-compatible regex for those, who are already familiar with the
former but it's still Perl regex and not POSIX! That's why some people
get confused when PCRE comes to the topic.
On i386 machines, grep -i is significantly slower:
i386, 7.2-STABLE of Sep 8, load averages: 0.00, 0.02, 0.00,
Mem: 336M Active, 442M Inact, 217M Wired, 38M Cache, 112M Buf, 198M Free
dev.cpu.0.freq: 2992 (Intel P-IV HTT enabled)
16Meg file result:
=>>> 16777216
=>>> fgrep
0.04 real 0.02 user 0.01 sys
0.04 real 0.03 user 0.01 sys
=>>> pcregrep
0.21 real 0.19 user 0.02 sys
0.21 real 0.20 user 0.00 sys
=>>> grep
0.04 real 0.02 user 0.01 sys << not -i
3.64 real 3.61 user 0.01 sys << -i
It's an interesting observation, I have never heard of this.
So it looks to me that, while there is a problem with case insensitive
comparison, just rewriting the expression is an optimization grep could
perform.
Either way, with the new text tools being written (done?) is this problem
being attacked, not fixable due to specifications or not considered an issue?
Any PR's needed / I missed? Patches to try?
[And it just occured to me bsdgrep is in ports]:
=>>> bsdgrep
0.93 real 0.74 user 0.00 sys
4.80 real 4.33 user 0.02 sys
4.97 real 4.34 user 0.01 sys
So here the optimization does not fly.
Unfortunately, this is the most important issue with BSDL texttools. In
the grep case, the BSDL version is ready and feature-complete but the
performance isn't quite satisfying. The main reason of this is GNU grep
uses a lot of shortcuts, which results in a bloated code (8000 LOC),
while BSDL grep keeps everything simple and straightforward (1500 LOC).
IMO, the desired solution would be to keep grep small and get a modern
regex library for FreeBSD, which performs well. Pushing regex
optimizations into grep is a bad idea because it not just makes the code
bloated but other regex users won't benefit from the optimization so the
problem should be fixed at its roots. And the current regex library we
have is old, slow and doesn't support wchar, at all.
Btw, do you mind if I include your script into the BSD grep
distribution? I already planned to write something like this for future
testing.
--
Gabor Kovesdan
FreeBSD Volunteer
EMAIL: ga...@freebsd.org .:|:. ga...@kovesdan.org
WEB: http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"