bug#22838: New 'Binary file' detection considered harmful

Marcello Perathoner Mon, 29 Feb 2016 12:14:16 -0800

On 02/29/2016 06:56 PM, Eric Blake wrote:

On 02/29/2016 10:54 AM, Eric Blake wrote:

Encoding errors are not characters, but bytes.  A line cannot contain
encoding errors.  Therefore, a file with encoding errors is not a text file.


Corollary - there exist files which are text files in some locales, but
binary files in others (based on whether the locale interprets the bytes
as an encoding error or as valid characters).

Yes, locale dependencies on standard behavior can be annoying.

You assume that a user will only ever want to grep text files encoded inthe machine's locale. That is not so.

As a German user I have on my disk files in many encodings: utf-8,iso-8859-1, win-1252, iso-8859-15, encodings that are now defunct likeCP850, CP847, "German 7-bit ASCII" that replaced braces with Umlauts,old WordStar files that used control characters inside.

Since 2.21 I will now have to always specify -a or LC_ALL=C whengrepping my files.





Regards

--
Marcello Perathoner

bug#22838: New 'Binary file' detection considered harmful

Reply via email to