On 09/30/2014 11:10 AM, Zoltán Herczeg wrote:
Grep already does that sort of thing. And it's smart enough to start matching
only at character boundaries. It's not libpcre's job to worry about this; the
caller can worry about it.
Thank you for bringing this up. I don't see any point of reimplementing what is
already there.
Sorry, it sounds like my earlier comment was unclear. GNU grep is smart
enough to start matching at character boundaries without checking the
validity of the input data. This helps it run faster. However, because
libpcre requires a validity prepass, grep -P must slow down and do the
validity check one way or another. Grep does this only when libpcre is
used, and that's one reason grep -P is slower than plain grep.
It's not a question of duplicating code: grep already has code to
validate binary data. It's a question of performance. Requiring a
prepass for validity checking is typically slower (or takes more energy,
or whatever) than checking validity on the fly. And in many cases going
multithreaded would just make matters worse.
I can understand that you don't want to take on the burden of making a
nontrivial libpcre performance improvement. Also, I hope 'grep -P'
performance, though not great, is good enough now to satisfy most
users. So perhaps we should just give the topic a rest.