bug#20526: grep BUG: text file is detected as binary

Paul Eggert Fri, 01 Jan 2016 16:31:34 -0800

Norihiro Tanaka wrote:

I get following output after apply the patch.  Is it expected?


$ printf 'a\na\377\na\n' | LANG=en_US.utf8 src/grep a
a
Binary file (standard input) matches

Yes, it's expected. Thanks, this should be stated more clearly, so I installedthe attached documentation patch.

>From 5bba395e00c01b8fc263d576ab3b34121fd6a3c0 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Thu, 31 Dec 2015 10:02:31 -0800
Subject: [PATCH] doc: clarify text vs binary match output

* NEWS:
* doc/grep.texi (File and Directory Selection):
Make it clearer that grep can now output matching text before
reporting a binary match.  Problem reported by Norihiro Tanaka in:
http://bugs.gnu.org/20526#83
---
 NEWS          | 14 +++++++++-----
 doc/grep.texi |  9 ++++-----
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index b451d76..6e97e45 100644
--- a/NEWS
+++ b/NEWS
@@ -4,11 +4,15 @@ GNU grep NEWS                                    -*- outline -*-
 
 ** Bug fixes
 
-  Binary files are now less likely to generate diagnostics.  grep now
-  reports "Binary file FOO matches" and suppresses further output when
-  grep is about to output a match that contains an encoding error.
-  Formerly, grep reported FOO to be binary merely because grep found
-  an encoding error in FOO before generating output for FOO.
+  Binary files are now less likely to generate diagnostics and more
+  likely to yield text matches.  grep now reports "Binary file FOO
+  matches" and suppresses further output instead of outputting a line
+  containing a encoding error; hence grep can now report matching text
+  before a later binary match.  Formerly, grep reported FOO to be
+  binary when it found an encoding error in FOO before generating
+  output for FOO, which meant it never reported both matching text and
+  matching binary data; this was less useful for searching text
+  containing encoding errors in non-matching lines.
   [bug introduced in grep-2.21]
 
   grep -c no longer stops counting when finding binary data.
diff --git a/doc/grep.texi b/doc/grep.texi
index 73151e4..b9a4d25 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -594,8 +594,7 @@ this is equivalent to the @samp{--binary-files=text} option.
 @item --binary-files=@var{type}
 @opindex --binary-files
 @cindex binary files
-If a file's allocation metadata,
-or if its data read before a line is selected for output,
+If a file's data or metadata
 indicate that the file contains binary data,
 assume that the file is of type @var{type}.
 Non-text bytes indicate binary data; these are either output bytes that are
@@ -604,9 +603,9 @@ improperly encoded for the current locale, or null input bytes when the
 Options}).
 
 By default, @var{type} is @samp{binary}, and when @command{grep}
-discovers that a file is binary it normally outputs either
-a one-line message saying that a binary file matches,
-or no message if there is no match.
+discovers that a file is binary it suppresses any further output, and
+instead outputs either a one-line message saying that a binary file
+matches, or no message if there is no match.
 When processing binary data, @command{grep} may treat non-text bytes
 as line terminators; for example, the pattern @samp{.} (period) might
 not match a null byte, as the null byte might be treated as a line
-- 
2.5.0

bug#20526: grep BUG: text file is detected as binary

Reply via email to