On 05/12/2015 02:41 AM, Kamil Dudka wrote: > On Monday 11 May 2015 21:27:35 Paul Eggert wrote: >> Perhaps we can improve the behavior of grep by changing its heuristic >> slightly. Currently grep reports "Binary file FOO matches" if it finds >> binary data in FOO before it finds the first match. Instead, perhaps we >> could change grep to report "Binary file FOO matches" when it sees that >> it's about to generate binary *output* copied from FOO, regardless of >> whether this output represents the first match. That is, when grep sees >> that it's about to output binary data, grep instead outputs "Binary file >> FOO matches" and then stops output for FOO (even if it already output some >> lines for ordinary matches in FOO). >> >> This approach would fix the problem of grep trashing the output stream, and >> it should be less drastic than grep's current approach, in that it would >> make grep more likely to do what Kamil Dudka is asking for (assuming grep >> is given mostly valid input interspersed with small amounts of binary >> data). > > Thanks for the suggestion! I believe that such approach would work for me. > Do you want me to write a patch implementing it? > > Eric, what do you think about the change proposed above?
I'm still a bit worried that encoding errors encountered on input, even though they don't match for output, may still cause issues for some patterns (we've had cases of encoding errors causing 'grep -P' to go into an infinite loop, for example); but yes, as the behavior is undefined, we are still justified in adopting those heuristics, if someone is willing to contribute a patch along those lines. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature