On Sat, 20 Dec 2014 02:23:27 +0100 Vincent Lefevre <vinc...@vinc17.net> wrote:
> Debian grep 2.20-3 6.64s (with -P) > Upstream grep 2.21 5.39s (with -P) > Debian pcregrep 8.35 0.71s Did you use pcregrep --utf-8? You should use pcregrep --utf-8 pcregrep to compare. By the way, pcregrep --utf-8 does not support binary files. If pcregrep found 20 errors, it will exit without reading an input text until the last. $ yes src/grep | head -1000 | xargs cat > big_grep $ ls -l big_grep -rw-r--r--. 1 staff users 611453000 Dec 20 11:30 big_grep $ time -p env LC_ALL=en_US.utf8 src/grep -P test big_grep real 10.16 user 10.09 sys 0.07 $ time -p pcregrep --buffer-size=65536 test big_grep real 1.50 user 1.41 sys 0.09 $ time -p pcregrep --buffer-size=65536 --utf-8 test big_grep 2>&1 | tail -1 pcregrep: Too many errors - abandoned. real 0.00 user 0.00 sys 0.00 $ pcregrep --version pcregrep version 8.36 2014-09-26