On 03/18/2016 02:20 AM, Chiel ten Brinke wrote:
Suppose we are doing a multiline regex pattern search on a bunch of files
and we want to extract the matches, e.g. for further processing. By
default, grep outputs matches separated by newlines, but since we are doing
multiline patterns this creates the inconvenience that we cannot easily
extract the individual matches.

Thanks for pointing this out. The problem is more than just an inconvenience: grep's behavior is incorrect, since (as you mention) its output can be ambiguous. -o and -z were added separately to GNU grep, and it appears that their combination wasn't considered. I installed the attached patch to the grep master so that -z uses null bytes to terminate output lines even when -o is used.

From ff74ec937ef90c264d2b926e877864fd9bc479f0 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 18 Mar 2016 09:40:49 -0700
Subject: [PATCH] grep: -oz now outputs null bytes, not newlines

* NEWS: Document this.
* doc/grep.texi (Other Options): Clarify that -z affects output
as well as input data.
* src/grep.c (print_line_middle): Output eolbyte, not newline, if -o.
* tests/null-byte: Test -o too.
* tests/pcre-context: Adjust test to match new behavior.
---
 NEWS               |  4 ++++
 doc/grep.texi      |  4 ++--
 src/grep.c         |  2 +-
 tests/null-byte    |  5 +++++
 tests/pcre-context | 10 ++++------
 5 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/NEWS b/NEWS
index 9de07ab..34d6c89 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,10 @@ GNU grep NEWS                                    -*- outline -*-
 
 * Noteworthy changes in release ?.? (????-??-??) [?]
 
+** Bug fixes
+
+  grep -oz now uses null bytes, not newlines, to terminate output lines.
+  [bug introduced in grep-2.5]
 
 * Noteworthy changes in release 2.24 (2016-03-10) [stable]
 
diff --git a/doc/grep.texi b/doc/grep.texi
index 8883b27..074113b 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -762,8 +762,8 @@ on platforms other than MS-DOS and MS-Windows.
 @opindex -z
 @opindex --null-data
 @cindex zero-terminated lines
-Treat the input as a set of lines, each terminated by a zero byte (the
-ASCII NUL character) instead of a newline.
+Treat input and output data as sequences of lines, each terminated by
+a zero byte (the ASCII NUL character) instead of a newline.
 Like the @option{-Z} or @option{--null} option,
 this option can be used with commands like
 @samp{sort -z} to process arbitrary file names.
diff --git a/src/grep.c b/src/grep.c
index 94ddbef..8baca5a 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -1152,7 +1152,7 @@ print_line_middle (char *beg, char *lim,
           fwrite_errno (b, 1, match_size);
           pr_sgr_end_if (match_color);
           if (only_matching)
-            putchar_errno ('\n');
+            putchar_errno (eolbyte);
         }
     }
 
diff --git a/tests/null-byte b/tests/null-byte
index 1d62806..f9bb00e 100755
--- a/tests/null-byte
+++ b/tests/null-byte
@@ -60,4 +60,9 @@ printf '%s\n' xxx 'Binary file in matches' > exp || framework_failure_
 grep -E 'xxx|z' in >out || fail=1
 compare exp out || fail=1
 
+printf '%s\0' 'abcadc' >in || framework_failure_
+printf '%s\0' 'abc' 'adc' >exp || framework_failure_
+grep -oz 'a[^c]*c' in >out || fail=1
+compare exp out || fail=1
+
 Exit $fail
diff --git a/tests/pcre-context b/tests/pcre-context
index f0c96e0..b910a20 100755
--- a/tests/pcre-context
+++ b/tests/pcre-context
@@ -23,12 +23,10 @@ Preceded by 4 empty lines.
 EOF
 test $? -eq 0 || framework_failure_
 
-cat >exp <<'EOF'
-Preceded by 2 empty lines.
-Preceded by 3 empty lines.
-Preceded by 4 empty lines.
-EOF
-test $? -eq 0 || framework_failure_
+printf '%s\0' \
+       'Preceded by 2 empty lines.' \
+       'Preceded by 3 empty lines.' \
+       'Preceded by 4 empty lines.' >exp || framework_failure_
 
 fail=0
 
-- 
2.5.0

Reply via email to