2016-11-18 09:47:50 -0800, Paul Eggert: > Stephane Chazelas wrote: > >$ time grep -Pz '(?-m)^/' ~/a > /dev/null > > It looks like you want "^" to stand for a newline character, not the > start of a line. That is not how grep -z works. -z causes the null > byte to be the line delimiter, and "^" should stand for a position > immediately after a null byte (or at start of file). [...]
No, sorry if I wasn't very clear, that's the other way round and it's the whole point of this discussion. grep had a bug in that it was calling pcre_exec on the content of each null delimited record with a regex compiled with PCRE_MULTILINE That caused printf 'a\nb\0' | grep -zP '^b' to match even though the record doesn't start with a "b". To work around it, you have to disable the PCRE_MULTILINE flag in the regexp syntax with the (?-m) PCRE operator, or use \A instead of ^. The problem was /fixed/ (and I'm arguing here it's the wrong fix), by disallowing ^ with -Pz while the obvious fix is to remove that PCRE_MULTILINE flag. As it turns out PCRE_MULTILINE is there because in the old days, before grep -Pz was supported, with grep -P (without -z), grep would pass more than one line to pcre_exec. If you look at the grep bug history, 90% of the grep pcre related bugs were caused by that. It was fixed/changed in http://git.savannah.gnu.org/cgit/grep.git/commit/?id=a14685c2833f7c28a427fecfaf146e0a861d94ba but Paolo forgot to remove the PCRE_MULTILINE flag when the code was changed to pass one line at a time to pcre_exec and PCRE_MULTILINE was no longer needed anymore (and later called problem when grep -Pz was supported). > It might be nice to have a syntax for matching a newline byte with > -z (or a null byte without -z, for that matter). But that would be a > new feature. That feature is already there. That's the (?m) PCRE operator. That's the whole point. That m flag (PCRE_MULTILINE) is on by default in GNU grep, and that's what it's causing all the problems. Once you turn it off *by default*, that makes ^ match the beginning of the NUL-delimited record as it should and one can use (?m) if he wants ^ to match the beginning of each line in the NUL-delimited record instead of just the beginning of the record. -- Stephane