Re the savannah bug report, http://savannah.gnu.org/bugs/?37600
[Let's continue on the mailing list -- now our preferred medium]
On Mon, Feb 24, 2014 at 6:57 AM, Stephane Chazelas wrote:
[...]
Thanks for the report.
I confirm it is still a problem with the latest, grep-2.18:
[Note that there's no
Hello,
Backreferences don't work with -w or -x in combination with -P:
$ echo aa | grep -Pw '(.)\1'
$
Or they work in an unexpected way:
$ echo aa | grep -Pw '(.)\2'
aa
The fix is simple:
--- src/pcresearch.c~ 2014-02-24 09:59:56.864374362 +
+++ src/pcresearch.c2014-02-24 07:33:04.
On Mon, Feb 24, 2014 at 2:01 AM, Stephane Chazelas
wrote:
> Hello,
>
> Backreferences don't work with -w or -x in combination with -P:
>
> $ echo aa | grep -Pw '(.)\1'
> $
>
> Or they work in an unexpected way:
>
> $ echo aa | grep -Pw '(.)\2'
> aa
>
> The fix is simple:
>
>
> --- src/pcresearch.c
Fine by me, thanks.
BTW, as discussed in another bug, the -w/-x invalidate the
(*UCP) and other PCRE special sequences. Chances are we can't
easily do much about it, but it may still be worth documenting.
Like, one should use
grep -P '(*UCP)\bword\b'
as
grep -wP '(*UCP)word'
won't work (pcreg
2014-02-24 08:53:17 -0800, Jim Meyering:
[...]
> This is pretty serious:
>
> $ printf 'p\xc3\xa8re\n' |LC_ALL=en_US.utf8 grep -w p
> père
I gets more complicated with combining characters:
$ printf 'pe\314\200re\n' | grep -w pe
père
You can't expect \w to match U+0300 alone. You can't e
On Mon, Feb 24, 2014 at 1:20 PM, Stephane Chazelas
wrote:
> A last note: with -w, pcregrep wraps the regexp in \b...\b
> instead of \b(?:...)\b, so it could be that those brackets are
> not necessary in the first place.
>
> Sorry I lied, it was not the last note ;-). Note the difference:
>
> $ ech
The doc has a confusing statement:
> 15. How can I match across lines?
>
>Standard grep cannot do this, as it is fundamentally line-based.
>Therefore, merely using the '[:space:]' character class does not
>match newlines in the way you might expect. However, if your grep
>is compi