On Sun, Jan 31, 2016 at 11:30 PM, Paul Eggert <egg...@cs.ucla.edu> wrote: > Jim Meyering wrote: >> >> I looked into it and do not see a way to fix it with reasonable cost. >> When grep finds a NUL byte, it punts on the entire block. > > > Hmm, I think of it differently. When grep finds a NUL it records that the > file is binary but if no match has been found so far, it keeps looking for > the first match, either in the block containing the NUL or in a later block. > In this case grep stops reading the file only after finding a match, and it > works correctly. > > The problem occurs if grep finds one or more text matches before the first > NUL: it reports those matches, records the fact that it found them, then > sees the NUL, then stops reading that file, and then the calling code > notices that this was (1) a binary file that (2) contained matches, so > outputs the "Binary file ... matches" message. This is wrong, because no > binary data actually matched. > > When grep finds a NUL, it should record that the file is binary and then > look for one more match after the NUL, then quit reading the file and report > "Binary file ... matches" only if it found that one more match. > > The code is complicated by the fact that the file could also be binary > because an output line contains an encoding error, something that's detected > in a different part of the code. > > Argh, I'm taking too long to explain this. It's easier to fix than to > explain. I installed a patch; what do you think?
Oh, I see, now. Nicely done. I noticed only now that I replied to you off-list. Didn't mean to. So am adding the bug email in Cc, so your explanation is recorded there. I added one more test case: http://git.savannah.gnu.org/cgit/grep.git/commit/?id=43f6246fe82f1