Norihiro Tanaka wrote:
I get following output after apply the patch. Is it expected?
$ printf 'a\na\377\na\n' | LANG=en_US.utf8 src/grep a
a
Binary file (standard input) matches
Yes, it's expected. Thanks, this should be stated more clearly, so I installed
the attached documentation patch.
>From 5bba395e00c01b8fc263d576ab3b34121fd6a3c0 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Thu, 31 Dec 2015 10:02:31 -0800
Subject: [PATCH] doc: clarify text vs binary match output
* NEWS:
* doc/grep.texi (File and Directory Selection):
Make it clearer that grep can now output matching text before
reporting a binary match. Problem reported by Norihiro Tanaka in:
http://bugs.gnu.org/20526#83
---
NEWS | 14 +++++++++-----
doc/grep.texi | 9 ++++-----
2 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/NEWS b/NEWS
index b451d76..6e97e45 100644
--- a/NEWS
+++ b/NEWS
@@ -4,11 +4,15 @@ GNU grep NEWS -*- outline -*-
** Bug fixes
- Binary files are now less likely to generate diagnostics. grep now
- reports "Binary file FOO matches" and suppresses further output when
- grep is about to output a match that contains an encoding error.
- Formerly, grep reported FOO to be binary merely because grep found
- an encoding error in FOO before generating output for FOO.
+ Binary files are now less likely to generate diagnostics and more
+ likely to yield text matches. grep now reports "Binary file FOO
+ matches" and suppresses further output instead of outputting a line
+ containing a encoding error; hence grep can now report matching text
+ before a later binary match. Formerly, grep reported FOO to be
+ binary when it found an encoding error in FOO before generating
+ output for FOO, which meant it never reported both matching text and
+ matching binary data; this was less useful for searching text
+ containing encoding errors in non-matching lines.
[bug introduced in grep-2.21]
grep -c no longer stops counting when finding binary data.
diff --git a/doc/grep.texi b/doc/grep.texi
index 73151e4..b9a4d25 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -594,8 +594,7 @@ this is equivalent to the @samp{--binary-files=text} option.
@item --binary-files=@var{type}
@opindex --binary-files
@cindex binary files
-If a file's allocation metadata,
-or if its data read before a line is selected for output,
+If a file's data or metadata
indicate that the file contains binary data,
assume that the file is of type @var{type}.
Non-text bytes indicate binary data; these are either output bytes that are
@@ -604,9 +603,9 @@ improperly encoded for the current locale, or null input bytes when the
Options}).
By default, @var{type} is @samp{binary}, and when @command{grep}
-discovers that a file is binary it normally outputs either
-a one-line message saying that a binary file matches,
-or no message if there is no match.
+discovers that a file is binary it suppresses any further output, and
+instead outputs either a one-line message saying that a binary file
+matches, or no message if there is no match.
When processing binary data, @command{grep} may treat non-text bytes
as line terminators; for example, the pattern @samp{.} (period) might
not match a null byte, as the null byte might be treated as a line
--
2.5.0