2014-02-24 20:55:42 -0800, Jim Meyering: > On Mon, Feb 24, 2014 at 1:20 PM, Stephane Chazelas > <stephane.chaze...@gmail.com> wrote: > > A last note: with -w, pcregrep wraps the regexp in \b...\b > > instead of \b(?:...)\b, so it could be that those brackets are > > not necessary in the first place.
The brackets are actually needed in cases like: grep -Pw 'foo|bar' (pcregrep has a bug there). > > Maybe instead of \b(?:...)\b, we could use (?<!\w)...(?!\w) > > > > $ echo a%%b | grep -P '(?<!\w)%%(?!\w)' > > $ echo %aa% | grep -P '(?<!\w)aa(?!\w)' > > %aa% > > I like both suggestions. Making -wP work like grep's -w makes perfect sense. > Care to prepare a patch to make it do that, with a separate test case? > "git format-patch ..." output preferred, if you're game. > > I pushed the above patch, but would welcome another one. Please find the patch attached. (note that tests/word-delim-multibyte fails for me, but it's not my doing, it was failing before). -- Stephane
>From c0f44ae6d988954557c0da533f336c9e522f570a Mon Sep 17 00:00:00 2001 From: Stephane Chazelas <stephane.chaze...@gmail.com> Date: Tue, 25 Feb 2014 15:55:04 +0000 Subject: [PATCH] Align grep -Pw with grep -w For the -w option, with -P, we used to look for the pattern surrounded by word boundaries. That's different from what grep -w does and what the documentation describes. Now align with grep -w and the documentation by using PCRE look-behind and look-ahead operators to match the pattern if it is not surrounded by word constituents. * src/pcresearch.c (Pcompile): Use (?<!\w)(?:...)(?!\w) rather than \b(?:...)\b. * NEWS (Bug fixes): Mention it. * tests/pcre-w: New file. * tests/Makefile.am (TESTS): Add it. That complements the fix for http://debbugs.gnu.org/16865 --- NEWS | 3 +++ src/pcresearch.c | 4 ++-- tests/Makefile.am | 1 + tests/pcre-w | 31 +++++++++++++++++++++++++++++++ 4 files changed, 37 insertions(+), 2 deletions(-) create mode 100755 tests/pcre-w diff --git a/NEWS b/NEWS index 49fe984..d4f9a89 100644 --- a/NEWS +++ b/NEWS @@ -8,6 +8,9 @@ GNU grep NEWS -*- outline -*- echo aa|grep -Pw '(.)\1' would fail to match, yet echo aa|grep -Pw '(.)\2' would match. + grep -Pw now works like grep -w in that the matched string has to be + preceded and followed by non-word components or the beginning and end + of the line (as opposed to word boundaries before). * Noteworthy changes in release 2.18 (2014-02-20) [stable] diff --git a/src/pcresearch.c b/src/pcresearch.c index d4a20ff..319155f 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -77,7 +77,7 @@ Pcompile (char const *pattern, size_t size) if (match_lines) strcpy (n, "^(?:"); if (match_words) - strcpy (n, "\\b(?:"); + strcpy (n, "(?<!\\w)(?:"); n += strlen (n); /* The PCRE interface doesn't allow NUL bytes in the pattern, so @@ -103,7 +103,7 @@ Pcompile (char const *pattern, size_t size) n += patlim - p; *n = '\0'; if (match_words) - strcpy (n, ")\\b"); + strcpy (n, ")(?!\\w)"); if (match_lines) strcpy (n, ")$"); diff --git a/tests/Makefile.am b/tests/Makefile.am index ecbe0e6..742a580 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -83,6 +83,7 @@ TESTS = \ pcre-abort \ pcre-invalid-utf8-input \ pcre-utf8 \ + pcre-w \ pcre-wx-backref \ pcre-z \ prefix-of-multibyte \ diff --git a/tests/pcre-w b/tests/pcre-w new file mode 100755 index 0000000..5040c5a --- /dev/null +++ b/tests/pcre-w @@ -0,0 +1,31 @@ +#! /bin/sh +# Before grep-2.19, grep -Pw %% would match %% enclosed in word boundaries +# +# Copyright (C) 2014 Free Software Foundation, Inc. +# +# Copying and distribution of this file, with or without modification, +# are permitted in any medium without royalty provided the copyright +# notice and this notice are preserved. + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +require_pcre_ + +fail=0 + +echo %aa% > in || framework_failure_ +grep -Pw aa in > out || fail=1 +compare out in || fail=1 + +echo a%%a > in || framework_failure_ +grep -Pw %% in > out && fail=1 +compare /dev/null out || fail=1 + +echo %%%% > in || framework_failure_ +grep -Pw %% in > out || fail=1 +compare out in || fail=1 + +echo %% > in || framework_failure_ +grep -Pw %% in > out || fail=1 +compare out in || fail=1 + +Exit $fail -- 1.8.5.3