Fine by me, thanks. BTW, as discussed in another bug, the -w/-x invalidate the (*UCP) and other PCRE special sequences. Chances are we can't easily do much about it, but it may still be worth documenting.
Like, one should use grep -P '(*UCP)\bword\b' as grep -wP '(*UCP)word' won't work (pcregrep has the same problem). In another bug, I've seen someone commenting that grep -wP 'a)(b' doesn't give the error message that one would expect (not that I'd expect anyone would care). A last note: with -w, pcregrep wraps the regexp in \b...\b instead of \b(?:...)\b, so it could be that those brackets are not necessary in the first place. Sorry I lied, it was not the last note ;-). Note the difference: $ echo a@@b | grep -w @@ $ echo a@@b | grep -Pw @@ a@@b Maybe instead of \b(?:...)\b, we could use (?<!\w)...(?!\w) $ echo a%%b | grep -P '(?<!\w)%%(?!\w)' $ echo %aa% | grep -P '(?<!\w)aa(?!\w)' %aa% Full text of original email included for reference: 2014-02-24 12:00:08 -0800, Jim Meyering: > On Mon, Feb 24, 2014 at 2:01 AM, Stephane Chazelas > <stephane.chaze...@gmail.com> wrote: > > Hello, > > > > Backreferences don't work with -w or -x in combination with -P: > > > > $ echo aa | grep -Pw '(.)\1' > > $ > > > > Or they work in an unexpected way: > > > > $ echo aa | grep -Pw '(.)\2' > > aa > > > > The fix is simple: > > > > > > --- src/pcresearch.c~ 2014-02-24 09:59:56.864374362 +0000 > > +++ src/pcresearch.c 2014-02-24 07:33:04.666398105 +0000 > > @@ -75,9 +75,9 @@ Pcompile (char const *pattern, size_t si > > Thanks a lot for the patch. > I've converted it to a proper commit with NEWS and a test case. > Please ack the attached if it's all ok with you (you're still the "Author:"): > From bfd21931b3cd088d642a190e9f030214df04045d Mon Sep 17 00:00:00 2001 > From: Stephane Chazelas <stephane.chaze...@gmail.com> > Date: Mon, 24 Feb 2014 11:54:09 -0800 > Subject: [PATCH] grep -P: fix it so backreferences now work with -w and -x > > To implement -w and -x, we bracket the search term with parentheses. > However, that set of parentheses had the default semantics of > "capturing", i.e., creating a backreferenceable matched quantity. > Instead, use (?:...), to create a non-capturing group. > * src/pcresearch.c (Pcompile): Use (?:...) rather than (...). > * NEWS (Bug fixes): Mention it. > * tests/pcre-wx-backref: New file. > * tests/Makefile.am (TESTS): Add it. > --- > NEWS | 6 ++++++ > src/pcresearch.c | 4 ++-- > tests/Makefile.am | 1 + > tests/pcre-wx-backref | 28 ++++++++++++++++++++++++++++ > 4 files changed, 37 insertions(+), 2 deletions(-) > create mode 100755 tests/pcre-wx-backref > > diff --git a/NEWS b/NEWS > index 771fd80..49fe984 100644 > --- a/NEWS > +++ b/NEWS > @@ -2,6 +2,12 @@ GNU grep NEWS -*- outline > -*- > > * Noteworthy changes in release ?.? (????-??-??) [?] > > +** Bug fixes > + > + grep -P now works with -w and -x and backreferences. Before, > + echo aa|grep -Pw '(.)\1' would fail to match, yet > + echo aa|grep -Pw '(.)\2' would match. > + > > * Noteworthy changes in release 2.18 (2014-02-20) [stable] > > diff --git a/src/pcresearch.c b/src/pcresearch.c > index 5b5ba3e..d4a20ff 100644 > --- a/src/pcresearch.c > +++ b/src/pcresearch.c > @@ -75,9 +75,9 @@ Pcompile (char const *pattern, size_t size) > > *n = '\0'; > if (match_lines) > - strcpy (n, "^("); > + strcpy (n, "^(?:"); > if (match_words) > - strcpy (n, "\\b("); > + strcpy (n, "\\b(?:"); > n += strlen (n); > > /* The PCRE interface doesn't allow NUL bytes in the pattern, so > diff --git a/tests/Makefile.am b/tests/Makefile.am > index 4ffea85..ecbe0e6 100644 > --- a/tests/Makefile.am > +++ b/tests/Makefile.am > @@ -83,6 +83,7 @@ TESTS = \ > pcre-abort \ > pcre-invalid-utf8-input \ > pcre-utf8 \ > + pcre-wx-backref \ > pcre-z \ > prefix-of-multibyte \ > r-dot \ > diff --git a/tests/pcre-wx-backref b/tests/pcre-wx-backref > new file mode 100755 > index 0000000..643aa9b > --- /dev/null > +++ b/tests/pcre-wx-backref > @@ -0,0 +1,28 @@ > +#! /bin/sh > +# Before grep-2.19, grep -P and -w/-x would not with a backreference. > +# > +# Copyright (C) 2014 Free Software Foundation, Inc. > +# > +# Copying and distribution of this file, with or without modification, > +# are permitted in any medium without royalty provided the copyright > +# notice and this notice are preserved. > + > +. "${srcdir=.}/init.sh"; path_prepend_ ../src > +require_pcre_ > + > +echo aa > in || framework_failure_ > +echo 'grep: reference to non-existent subpattern' > exp-err \ > + || framework_failure_ > + > +fail=0 > + > +for xw in x w; do > + grep -P$xw '(.)\1' in > out 2>&1 || fail=1 > + compare out in || fail=1 > + > + grep -P$xw '(.)\2' in > out 2> err && fail=1 > + compare /dev/null out || fail=1 > + compare exp-err err || fail=1 > +done > + > +Exit $fail > -- > 1.9.0 >