Marcello Perathoner wrote:
The new behaviour of grep -- to output 'Binary file matches' after output
started
I assume that the "new behavior" you're talking about is for grep 2.21
(2014-11-23) and later, as that's the version of grep that started outputting
"Binary file matches" due to input encoding errors. For example, on my platform
(Ubuntu 15.10), the shell command:
LC_ALL=C awk 'BEGIN {for(i=1; i<256; i++) printf "%c %d\n", i, i}' |
LC_ALL=en_US.utf8 grep 126
outputs "Binary file (standard input) matches" in grep 2.21.
These changes were put in partly due to security issues, not only having to do
with grep's internals (the old 'grep' would dump core sometimes when given
encoding errors), but also for the benefit of invokers expecting properly
encoded text.
To some extent we were stuck between a rock and a hard place here. No matter
what 'grep' does, it will do the wrong thing for some usages. But overall we
thought it better for grep's output to be valid text.
I think you can work around the problem for unfixed backup2l by setting your
system's locale to a unibyte locale where all bytes are valid. The
en_US.ISO-8859-15 locale, say.
Of course backup2l should get fixed, regardless of what we do with 'grep' or
with your system locale.
$ find /etc/ssl/certs/ | LANG= grep pem
Wouldn't the following be better?
find /etc/ssl/certs/ -name '*.pem'
This avoids false matches like '/etc/ssl/certs/pemmican'. Alternatively:
find /etc/ssl/certs/ -print | grep -a '\.pem$'
It is easy to catch such a file from the internet or from song or picture
metadata.
None of the above approaches will work for arbitrary file names ("off the
Internet"), because they all mishandle file names containing newlines. backup2l
needs to do something like this:
find /etc/ssl/certs/ -name '*.pem' -print0
or like this:
find /etc/ssl/certs/ -print0 | grep -az '\.pem$'
with remaining code using null bytes instead of newlines to terminate file
names. This is the sort of thing that backup2l should be doing.