With the patch that fixes bug 18266, grep -P works again on binary files (with invalid UTF-8 sequences), but it is now significantly slower than old versions (which could yield undefined behavior).
Timings with the Debian packages on my personal svn working copy (binary + text files): 2.18-2 0.9s with -P, 0.4s without -P 2.20-3 11.6s with -P, 0.4s without -P On this example, that's a 13x slowdown! Though the performance issue would better be fixed in libpcre3, I suppose that it is not so simple and won't occur any time soon. Things could be done in grep: 1. Ignore -P when the pattern would have the same meaning without -P (patterns could also be transformed, e.g. "a\d+b" -> "a[0-9]\+b", at least for the simplest cases). 2. Call PCRE in the C locale when this is equivalent. 3. Transform invalid bytes to null bytes in-place before the PCRE call. This changes the current semantic, but: * the semantic on invalid bytes has never been specified, AFAIK; * the best *practical* behavior may not be the current one (I personally prefer to be able to match invalid bytes, just like one can match top-bit-set characters in the C locale, and seeing such invalid bytes as equivalent to null bytes would not be a problem for most users, IMHO -- things can also be configurable). -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)