Eric Blake wrote: > POSIX allows this behavior, in that it says that grep's behavior is > undefined on non-text files (which you have by virtue of your NUL > byte). Since this is documented behavior of GNU grep when -a is not > used, I'm closing this as not a bug. But feel free to add further > comments to this thread. If I start grep with the "-a" option or "--binary=text", the bug does not show up.
"grep --binary-files=binary" which is the default shows the bug. I am relatively sure, that the auto guessing code is incorrect or limited, if a null character is found after 32KB. The manual page says about the auto guessing code: -U, --binary Treat the file(s) as binary. By default, under MS-DOS and MS- Windows, grep guesses the file type by looking at the contents of the first 32KB read from the file. If grep decides the file is a text file, it strips the CR characters from the original file contents (to make regular expressions with ^ and $ work correctly). Specifying -U overrules this guesswork, causing all files to be read and passed to the matching mechanism verbatim; if the file is a text file with CR/LF pairs at the end of each line, this will cause some regular expressions to fail. This option has no effect on platforms other than MS-DOS and MS- Windows. I see these problems: 1. The binary mode is implemented inconsistent. It would be acceptable, if grep produces none (no match, exit code >0) or exactly one output line ("Binary file testfile.txt matches", exit code 0). It is not acceptable, that grep writes some matching text lines and later "Binary file testfile.txt matches" and exits with code 0. 2. Linux or more precisely None-MS-DOS and None-MS-Windows users will oversee the auto guessing section in manual page, because of the notes "By default, under MS-DOS and MS-Windows, grep guesses the file type by looking at the contents of the first 32KB read from the file." and "This option has no effect on platforms other than MS-DOS and MS-Windows." 3. The auto-guessing mechanism is not documented somewhere else in the documentation. 4. The auto guessing limitations are somehow documented in the manual page, but not in the BUGS section. 5. The exit code should not be 0, if grep founds an error in input which it can't recover. 6. The error message "Binary file testfile.txt matches" must not be written on standard output, if matching text lines are written before. 7. POSIX defines minimal assurances for grep. Of course GNU grep can or should be better. 8. Other implementations (like the tested FreeBSD version) do not show the bug. Also busybox works correctly.