On 9/30/14 12:39, Paul Eggert wrote:
GNU grep is smart
enough to start matching at character boundaries without checking the
validity of the input data. This helps it run faster. However, because
libpcre requires a validity prepass, grep -P must slow down and do the
validity check one way or another. Grep does this only when libpcre is
used, and that's one reason grep -P is slower than plain grep.
Now that Grep master on Savannah has been changed to use PCRE2 instead
of PCRE, the 'grep -P' performance problem seems to have been fixed, in
that the following commands now take about the same amount of time:
grep -P zzzyyyxxx 10840.pdf
pcre2grep -U zzzyyyxxx 10840.pdf
where the file is from <http://research.nhm.org/pdfs/10840/10840.pdf>.
Formerly, 'grep -P' was about 10x slower on this test.
My guess is that the grep -P performance boost comes from bleeding-edge
grep using PCRE2's PCRE2_MATCH_INVALID_UTF option.
I'm closing this old bug report <https://bugs.gnu.org/18454>. We can
always reopen it if there are still performance issues that I've missed.