bug#22838: New 'Binary file' detection considered harmful

Paul Eggert Sun, 28 Feb 2016 14:14:33 -0800

Marcello Perathoner wrote:

The new behaviour of grep -- to output 'Binary file matches' after output
started

I assume that the "new behavior" you're talking about is for grep 2.21(2014-11-23) and later, as that's the version of grep that started outputting"Binary file matches" due to input encoding errors. For example, on my platform(Ubuntu 15.10), the shell command:


LC_ALL=C awk 'BEGIN {for(i=1; i<256; i++) printf "%c %d\n", i, i}' |
LC_ALL=en_US.utf8 grep 126

outputs "Binary file (standard input) matches" in grep 2.21.

These changes were put in partly due to security issues, not only having to dowith grep's internals (the old 'grep' would dump core sometimes when givenencoding errors), but also for the benefit of invokers expecting properlyencoded text.

To some extent we were stuck between a rock and a hard place here. No matterwhat 'grep' does, it will do the wrong thing for some usages. But overall wethought it better for grep's output to be valid text.

I think you can work around the problem for unfixed backup2l by setting yoursystem's locale to a unibyte locale where all bytes are valid. Theen_US.ISO-8859-15 locale, say.

Of course backup2l should get fixed, regardless of what we do with 'grep' orwith your system locale.

$ find /etc/ssl/certs/ | LANG= grep pem


Wouldn't the following be better?

find /etc/ssl/certs/ -name '*.pem'

This avoids false matches like '/etc/ssl/certs/pemmican'.  Alternatively:

find /etc/ssl/certs/ -print | grep -a '\.pem$'

It is easy to catch such a file from the internet or from song or picture 
metadata.

None of the above approaches will work for arbitrary file names ("off theInternet"), because they all mishandle file names containing newlines. backup2lneeds to do something like this:


find /etc/ssl/certs/ -name '*.pem' -print0

or like this:

find /etc/ssl/certs/ -print0 | grep -az '\.pem$'

with remaining code using null bytes instead of newlines to terminate filenames. This is the sort of thing that backup2l should be doing.

bug#22838: New 'Binary file' detection considered harmful

Reply via email to