On Mon, May 15, 2017 at 8:24 AM, Junio C Hamano <[email protected]> wrote:
> Ævar Arnfjörð Bjarmason <[email protected]> writes:
>
>> Factor the test for \0 in grep patterns into a function. Since commit
>> 9eceddeec6 ("Use kwset in grep", 2011-08-21) any pattern containing a
>> \0 is considered fixed as regcomp() can't handle it.
>>
>> This limitation was never documented, and other some regular
>> expression engines are capable of compiling a pattern containing a
>> \0. Factoring this out makes a subsequent change which does that
>> smaller.
>
> The latter half of the first sentence of this paragraph does not
> parse well, around "and other some". "never documented" makes
> readers expect "... so let's document it", but I think you are
> driving in a different direction, something like...
I'll try to reword it.
> Since some other regular expression engines are capable of a
> pattern to match a string with NUL in it, it would be nice
> if we can use such an engine and match against NUL; as this
> "patterns with NUL always use --fixed match" is not documented,
> let's hope that nobody depend on that current behaviour.
> This step does not yet change the behaviour yet, but it
> makes it easier to do so in later steps.
>
> perhaps? I tend to agree that it is OK to break users who depended
> on that "patterns with NULL match with --fixed" (partly because it
> is so unintuitive and not useful behaviour, and partly because it is
> easy for them to explicitly say -F), but let's make sure we clearly
> say that we will be knowingly breaking them.
I think there's a few different things going on here, which I'll
address accordingly.
* Yes, if someone is relying on the undocumented behavior of a \0
pattern implying -F if and when I change that their stuff will break.
I'm going to actually just drop this whole discussion from this
commit, we can have an explanation for that when and if I actually
change it in a later series.
* I'm making things more confusing by saying "engines" here, but it's
important to note that git has never in its docs promised to use the C
library implementation of POSIX regexes, it's just advertised that it
uses POSIX regular *expressions*, which are a different thing.
Implementation != syntax.
E.g. GNU grep uses POSIX regex syntax, but its engine does not have
the same limitation as ours:
$ (command cd /tmp && printf "æ\0ð\n" >a && printf "æ\n" >> a;
printf "æ\0[öð]" >f; grep -a -f f a; echo $?)
æð
0
I went down this particular rabbit hole because I was genuinely
surprised that this didn't work with git-grep, until I realized that
of course the implementation detail of using C strings for this under
the hood was bleeding through.