2014-02-24 20:55:42 -0800, Jim Meyering:
> On Mon, Feb 24, 2014 at 1:20 PM, Stephane Chazelas
> <stephane.chaze...@gmail.com> wrote:
> > A last note: with -w, pcregrep wraps the regexp in \b...\b
> > instead of \b(?:...)\b, so it could be that those brackets are
> > not necessary in the first place.

The brackets are actually needed in cases like:

grep -Pw 'foo|bar'

(pcregrep has a bug there).


> > Maybe instead of \b(?:...)\b, we could use (?<!\w)...(?!\w)
> >
> > $ echo a%%b | grep -P '(?<!\w)%%(?!\w)'
> > $ echo %aa% | grep -P '(?<!\w)aa(?!\w)'
> > %aa%
> 
> I like both suggestions. Making -wP work like grep's -w makes perfect sense.
> Care to prepare a patch to make it do that, with a separate test case?
> "git format-patch ..." output preferred, if you're game.
> 
> I pushed the above patch, but would welcome another one.

Please find the patch attached.

(note that tests/word-delim-multibyte fails for me, but it's not
my doing, it was failing before).

-- 
Stephane
>From c0f44ae6d988954557c0da533f336c9e522f570a Mon Sep 17 00:00:00 2001
From: Stephane Chazelas <stephane.chaze...@gmail.com>
Date: Tue, 25 Feb 2014 15:55:04 +0000
Subject: [PATCH] Align grep -Pw with grep -w

For the -w option, with -P, we used to look for the pattern surrounded by
word boundaries. That's different from what grep -w does and what the
documentation describes. Now align with grep -w and the documentation by
using PCRE look-behind and look-ahead operators to match the pattern if
it is not surrounded by word constituents.
* src/pcresearch.c (Pcompile): Use (?<!\w)(?:...)(?!\w) rather than
  \b(?:...)\b.
* NEWS (Bug fixes): Mention it.
* tests/pcre-w: New file.
* tests/Makefile.am (TESTS): Add it.
That complements the fix for http://debbugs.gnu.org/16865
---
 NEWS              |  3 +++
 src/pcresearch.c  |  4 ++--
 tests/Makefile.am |  1 +
 tests/pcre-w      | 31 +++++++++++++++++++++++++++++++
 4 files changed, 37 insertions(+), 2 deletions(-)
 create mode 100755 tests/pcre-w

diff --git a/NEWS b/NEWS
index 49fe984..d4f9a89 100644
--- a/NEWS
+++ b/NEWS
@@ -8,6 +8,9 @@ GNU grep NEWS                                    -*- outline -*-
   echo aa|grep -Pw '(.)\1' would fail to match, yet
   echo aa|grep -Pw '(.)\2' would match.
 
+  grep -Pw now works like grep -w in that the matched string has to be
+  preceded and followed by non-word components or the beginning and end
+  of the line (as opposed to word boundaries before).
 
 * Noteworthy changes in release 2.18 (2014-02-20) [stable]
 
diff --git a/src/pcresearch.c b/src/pcresearch.c
index d4a20ff..319155f 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -77,7 +77,7 @@ Pcompile (char const *pattern, size_t size)
   if (match_lines)
     strcpy (n, "^(?:");
   if (match_words)
-    strcpy (n, "\\b(?:");
+    strcpy (n, "(?<!\\w)(?:");
   n += strlen (n);
 
   /* The PCRE interface doesn't allow NUL bytes in the pattern, so
@@ -103,7 +103,7 @@ Pcompile (char const *pattern, size_t size)
   n += patlim - p;
   *n = '\0';
   if (match_words)
-    strcpy (n, ")\\b");
+    strcpy (n, ")(?!\\w)");
   if (match_lines)
     strcpy (n, ")$");
 
diff --git a/tests/Makefile.am b/tests/Makefile.am
index ecbe0e6..742a580 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -83,6 +83,7 @@ TESTS =						\
   pcre-abort					\
   pcre-invalid-utf8-input			\
   pcre-utf8					\
+  pcre-w					\
   pcre-wx-backref				\
   pcre-z					\
   prefix-of-multibyte				\
diff --git a/tests/pcre-w b/tests/pcre-w
new file mode 100755
index 0000000..5040c5a
--- /dev/null
+++ b/tests/pcre-w
@@ -0,0 +1,31 @@
+#! /bin/sh
+# Before grep-2.19, grep -Pw %% would match %% enclosed in word boundaries
+#
+# Copyright (C) 2014 Free Software Foundation, Inc.
+#
+# Copying and distribution of this file, with or without modification,
+# are permitted in any medium without royalty provided the copyright
+# notice and this notice are preserved.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+require_pcre_
+
+fail=0
+
+echo %aa% > in || framework_failure_
+grep -Pw aa in > out || fail=1
+compare out in || fail=1
+
+echo a%%a > in || framework_failure_
+grep -Pw %% in > out && fail=1
+compare /dev/null out || fail=1
+
+echo %%%% > in || framework_failure_
+grep -Pw %% in > out || fail=1
+compare out in || fail=1
+
+echo %% > in || framework_failure_
+grep -Pw %% in > out || fail=1
+compare out in || fail=1
+
+Exit $fail
-- 
1.8.5.3

Reply via email to