On 03/18/2016 02:20 AM, Chiel ten Brinke wrote:
Suppose we are doing a multiline regex pattern search on a bunch of files and we want to extract the matches, e.g. for further processing. By default, grep outputs matches separated by newlines, but since we are doing multiline patterns this creates the inconvenience that we cannot easily extract the individual matches.
Thanks for pointing this out. The problem is more than just an inconvenience: grep's behavior is incorrect, since (as you mention) its output can be ambiguous. -o and -z were added separately to GNU grep, and it appears that their combination wasn't considered. I installed the attached patch to the grep master so that -z uses null bytes to terminate output lines even when -o is used.
From ff74ec937ef90c264d2b926e877864fd9bc479f0 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Fri, 18 Mar 2016 09:40:49 -0700 Subject: [PATCH] grep: -oz now outputs null bytes, not newlines * NEWS: Document this. * doc/grep.texi (Other Options): Clarify that -z affects output as well as input data. * src/grep.c (print_line_middle): Output eolbyte, not newline, if -o. * tests/null-byte: Test -o too. * tests/pcre-context: Adjust test to match new behavior. --- NEWS | 4 ++++ doc/grep.texi | 4 ++-- src/grep.c | 2 +- tests/null-byte | 5 +++++ tests/pcre-context | 10 ++++------ 5 files changed, 16 insertions(+), 9 deletions(-) diff --git a/NEWS b/NEWS index 9de07ab..34d6c89 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,10 @@ GNU grep NEWS -*- outline -*- * Noteworthy changes in release ?.? (????-??-??) [?] +** Bug fixes + + grep -oz now uses null bytes, not newlines, to terminate output lines. + [bug introduced in grep-2.5] * Noteworthy changes in release 2.24 (2016-03-10) [stable] diff --git a/doc/grep.texi b/doc/grep.texi index 8883b27..074113b 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -762,8 +762,8 @@ on platforms other than MS-DOS and MS-Windows. @opindex -z @opindex --null-data @cindex zero-terminated lines -Treat the input as a set of lines, each terminated by a zero byte (the -ASCII NUL character) instead of a newline. +Treat input and output data as sequences of lines, each terminated by +a zero byte (the ASCII NUL character) instead of a newline. Like the @option{-Z} or @option{--null} option, this option can be used with commands like @samp{sort -z} to process arbitrary file names. diff --git a/src/grep.c b/src/grep.c index 94ddbef..8baca5a 100644 --- a/src/grep.c +++ b/src/grep.c @@ -1152,7 +1152,7 @@ print_line_middle (char *beg, char *lim, fwrite_errno (b, 1, match_size); pr_sgr_end_if (match_color); if (only_matching) - putchar_errno ('\n'); + putchar_errno (eolbyte); } } diff --git a/tests/null-byte b/tests/null-byte index 1d62806..f9bb00e 100755 --- a/tests/null-byte +++ b/tests/null-byte @@ -60,4 +60,9 @@ printf '%s\n' xxx 'Binary file in matches' > exp || framework_failure_ grep -E 'xxx|z' in >out || fail=1 compare exp out || fail=1 +printf '%s\0' 'abcadc' >in || framework_failure_ +printf '%s\0' 'abc' 'adc' >exp || framework_failure_ +grep -oz 'a[^c]*c' in >out || fail=1 +compare exp out || fail=1 + Exit $fail diff --git a/tests/pcre-context b/tests/pcre-context index f0c96e0..b910a20 100755 --- a/tests/pcre-context +++ b/tests/pcre-context @@ -23,12 +23,10 @@ Preceded by 4 empty lines. EOF test $? -eq 0 || framework_failure_ -cat >exp <<'EOF' -Preceded by 2 empty lines. -Preceded by 3 empty lines. -Preceded by 4 empty lines. -EOF -test $? -eq 0 || framework_failure_ +printf '%s\0' \ + 'Preceded by 2 empty lines.' \ + 'Preceded by 3 empty lines.' \ + 'Preceded by 4 empty lines.' >exp || framework_failure_ fail=0 -- 2.5.0