On 02/28/2016 11:13 PM, Paul Eggert wrote:
These changes were put in partly due to security issues, not only having
to do with grep's internals (the old 'grep' would dump core sometimes
when given encoding errors), but also for the benefit of invokers
expecting properly encoded text.
To some extent we were stuck between a rock and a hard place here. No
matter what 'grep' does, it will do the wrong thing for some usages. But
overall we thought it better for grep's output to be valid text.
You are driving out demons by Beelzebub.
grep is a core component of every unix system. You cannot change the
behaviour or interface of such a fundamental tool without incurring in
substantial breakage. Keeping the old bug is far wiser than to fix it
and introduce a new bug.
Copying faulty input to the output is a preferable failure mode to
dropping part of the expected output. People do not expect grep to
validate their input but they do expect grep to produce a complete
result set.
A text file with encoding problems is a text file and not a binary file.
$ find /etc/ssl/certs/ | LANG= grep pem
Wouldn't the following be better?
find /etc/ssl/certs/ -name '*.pem'
I'm not doing that. That was just an example to show how grep now gives
incorrect results.
Many more cases can be made: any process that feeds tainted
(user-provided) strings to grep can now be made to fail. Eg. a process
that greps apache logs for known exploit signatures will now fail if the
attacker sends a bogus user-agent string.
Regards
--
Marcello Perathoner