On 02/29/2016 06:56 PM, Eric Blake wrote:
On 02/29/2016 10:54 AM, Eric Blake wrote:
Encoding errors are not characters, but bytes. A line cannot contain
encoding errors. Therefore, a file with encoding errors is not a text file.
Corollary - there exist files which are text files in some locales, but
binary files in others (based on whether the locale interprets the bytes
as an encoding error or as valid characters).
Yes, locale dependencies on standard behavior can be annoying.
You assume that a user will only ever want to grep text files encoded in
the machine's locale. That is not so.
As a German user I have on my disk files in many encodings: utf-8,
iso-8859-1, win-1252, iso-8859-15, encodings that are now defunct like
CP850, CP847, "German 7-bit ASCII" that replaced braces with Umlauts,
old WordStar files that used control characters inside.
Since 2.21 I will now have to always specify -a or LC_ALL=C when
grepping my files.
Regards
--
Marcello Perathoner