Dear grep maintainers, I've realized that grep -P takes quadratic time on long lines with many short matches. For instance, executing './src/grep -P -o "a" a.txt > a.out' on a file 'a.txt' consisting of N characters 'a' takes time quadratic in N. I've used grep-3.3 and pcre-8.43 for the benchmarks.
The root causes for this behavior are as follows: 1. in src/pcresearch.c on l. 222 (at commit cf09252295c554dd3eba4cdb8eb53911fb495f40), the end of the line is searched each time a new match is searched; this already results in quadratic runtime in the above mentioned case 2. the function 'pcre_exec' from pcre-8.43 called in src/pcresearch.c on l. 71 for each match checks if the provided string is valid UTF-8 (code implemented in pcre_valid_utf8.c); this also results in quadratic runtime On your side, it is possible to fix the first root cause. I'll post an e-mail to PCRE mailing list about the second root cause. Best regards, Martin