On 04/06/2016 05:04 PM, Bjoern Jacke wrote: > On 07.04.2016 00:33, Eric Blake wrote: >> That behavior complies with POSIX requirements. > > can you give a quote here? One thing which is not POSIX compliant is > that the diagnostic messages is given back on stdout. > http://pubs.opengroup.org/onlinepubs/9699919799/ says: > > --snip-- > LC_MESSAGES > Determine the locale that should be used to affect the format and > contents of diagnostic messages written to standard error. > --snap--
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html STDIN The standard input shall be used if no file operands are specified, and shall be used if a file operand is '-' and the implementation treats the '-' as meaning standard input. Otherwise, the standard input shall not be used. See the INPUT FILES section. INPUT FILES The input files shall be text files. As soon as you supply grep with non-text-file input, POSIX no longer applies, and we can do WHATEVER WE WANT. The violation is not in grep's behavior, but in yours for passing a binary file. We have chosen that WHATEVER WE WANT means that by default, we will tell you (on stdout) that the binary file matches, but if you use the (non-standard extension) -a option, we will pretend the file is text anyways. And it's been documented that way for basically "forever" in GNU grep. What's changed recently is what we've done under the hood (more efficient recognition of binary files, treating '\0' and '\n' identically as line terminators when -a is not in effect because of the speed improvements it lets us gain, and attempts with heuristics to avoid spamming terminals or downstream clients with encoding errors when -a is not in effect). But all of those still fall under the broad category of WHATEVER WE WANT as it falls outside the POSIX standard. And yes, maybe we could change grep to print the "Binary file matches" message to stderr, but that in turn will probably break other scripts, and lead to even more complaints from people doing non-standard things and expecting consistent results. That said, patches are still welcome, if you think you have better heuristics than what we currently have, and as long as it still falls within the realm of WHATEVER WE WANT. > if you consider grepping text files with mixed encodings as invalid use > of grep, then you should not return 0 and/or output the "Binary file > (standard input) matches" on stdout. This makes the output of GNU grep > look like a valid match. Maybe changing the exit status when a binary file is encountered is worth doing - but not returning status 0 when a match is detected is more likely to do harm than good. > > You say "grep -a" is your friend to all the users, who want to grep log > files (cause they tend to conain mixed encodinds). Sure, -a is a > workaround to make GNU grep work as before again. Realisically 99.99 of > the users will not know that though, because this is the first grep > version ever I guess, that requires this. Also -a is a GNU option only, > so portable scripts will not be able to use that. Portable scripts are not able to grep binary files, period. As long as you don't mind non-portable extensions, 'grep -a' is what you want. > > I guess you are aware, that you will break a lot of existing scripts > with that change of treating mixed encoding input files as binary like > the way you do it now with GNU grep >= 2.23 ? Yes, we are aware that lots of users are getting an education on the subtleties of POSIX. But that doesn't mean it is a bug. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature