Hello,
Backreferences don't work with -w or -x in combination with -P:
$ echo aa | grep -Pw '(.)\1'
$
Or they work in an unexpected way:
$ echo aa | grep -Pw '(.)\2'
aa
The fix is simple:
--- src/pcresearch.c~ 2014-02-24 09:59:56.864374362 +
+++ src/pcresearch.c2014-02-24 07:33:04.
...\b
instead of \b(?:...)\b, so it could be that those brackets are
not necessary in the first place.
Sorry I lied, it was not the last note ;-). Note the difference:
$ echo a@@b | grep -w @@
$ echo a@@b | grep -Pw @@
a@@b
Maybe instead of \b(?:...)\b, we could use (? On Mon, Feb 24,
2014-02-24 08:53:17 -0800, Jim Meyering:
[...]
> This is pretty serious:
>
> $ printf 'p\xc3\xa8re\n' |LC_ALL=en_US.utf8 grep -w p
> père
I gets more complicated with combining characters:
$ printf 'pe\314\200re\n' | grep -w pe
père
You can't expect \w to match U+0300 alone. You can't e
The doc has a confusing statement:
> 15. How can I match across lines?
>
>Standard grep cannot do this, as it is fundamentally line-based.
>Therefore, merely using the '[:space:]' character class does not
>match newlines in the way you might expect. However, if your grep
>is compi
Also:
$ printf 'a\nb\0' | grep -z 'a$'
$ printf 'a\nb\0' | grep -zP 'a$'
a
b
$ printf 'a\nb\0' | grep -zxP a
a
b
Why use PCRE_MULTILINE here?
2014-02-24 20:55:42 -0800, Jim Meyering:
> On Mon, Feb 24, 2014 at 1:20 PM, Stephane Chazelas
> wrote:
> > A last note: with -w, pcregrep wraps the regexp in \b...\b
> > instead of \b(?:...)\b, so it could be that those brackets are
> > not necessary in the first place.
T
2014-02-25 10:03:28 -0800, Jim Meyering:
[...]
> I've uncapitalized the 1-line summary, changed a That to This
> in the log, and added examples to NEWS, and added an empty
> line to restore the 2-empty-line section delimiter.
[...]
> I've closed this ticket, and will push once you ack these changes
2015-11-08 21:49:03 +0100, Valerio Bozzolan:
> Sorry... typo...
>
> echo abcde | grep -o -E 'b([a-z])d'
> => "bcd"
>
> Can't I choose to have only "c"?
[...]
That's correct, GNU grep doesn't have that capability (yet).
Recent versions of pcregrep do:
$ echo abc | pcregrep -o1 '.(.).'
b
2015-11-09 17:00:36 +0100, Valerio Bozzolan:
> Thanks for agreeing with the evolution of the meaning of "-o".
>
> Just to make you a laugh: I was reproducing egrep with $BASH_REMATCH:
> https://gist.github.com/valerio-bozzolan/6787675e931dce1ba7e9
>
> Definitely not beautiful... but really effect
2015-11-22 21:24:05 -0800, Shivanshu Goyal:
[...]
> I think I found a bug which did not exist in version 2.14, but does seem to
> exist in versions 2.16 and 2.22. I have not tested any other versions.
>
> Say there is a file with the following contents:
>
> shivanshu@thetis:tmp$ cat temp | xxd
>
Hello,
I'm just finding out that ^ and $ no longer work with grep -Pz:
https://unix.stackexchange.com/questions/324263/grep-command-doesnt-support-and-anchors-when-its-with-pz
$ grep -Pz '^'
grep: unescaped ^ or $ not supported with -Pz
Which points to this bug.
Note that, it's not that pcre do
2016-11-18 07:54:29 -0800, Paul Eggert:
> Stephane Chazelas wrote:
> >Can that bug please be reopened so it can be addressed
> >differenly (PCRE_MULTILINE removed, PCRE_DOLLAR_ENDONLY added)?
>
> Removing PCRE_MULTILINE will make 'grep' wy slower. Can you
>
2016-11-18 16:24:02 +, Stephane Chazelas:
[...]
> Why would it make it slower. AFAICT, PCRE_MULTILINE *adds*
> some overhead. It is really meant to be only about changing the
> behaviour of ^ and $.
[...]
Unsurprisingly,
where ~/a contains the output of find / -print0 (which is
2016-11-18 08:48:04 -0800, Paul Eggert:
> Stephane Chazelas wrote:
> >Why would it make it slower. AFAICT, PCRE_MULTILINE *adds*
> >some overhead.
>
> As I understand it, PCRE_MULTILINE lets 'grep' apply a pattern to an
> entire buffer that contains many lines
2016-11-18 17:06:36 +, Stephane Chazelas:
[...]
> That might have been the case a long time ago, as I remember
> some discussion about it as it explained some wrong information
> in the documentation, but as far as I and gdb can tell, grep
> 2.26 at least call pcre_exec for every
For the record, the doc/test confusion was fixed by commit
b73296ace186451b096b075461634c153d1fa525
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=b73296ace186451b096b075461634c153d1fa525
See also https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22655#47
and below about PCRE_MULTILINE.
2016-11-18 17:32:59 +, Stephane Chazelas:
[...]
> And PCRE_MULTILINE now can only improve performance.
[...]
Sorry, I meant "and *removing* PCRE_MULTILINE now can only
improve performance". (and we need to add PCRE_DOLLAR_ENDONLY or
otherwise printf 'a\n\0' | grep -Pz 'a$' would still match).
2016-11-18 09:47:50 -0800, Paul Eggert:
> Stephane Chazelas wrote:
> >$ time grep -Pz '(?-m)^/' ~/a > /dev/null
>
> It looks like you want "^" to stand for a newline character, not the
> start of a line. That is not how grep -z works. -z causes the
2016-11-18 15:37:16 -0800, Paul Eggert:
[...]
> >That might have been the case a long time ago, as I remember
> >some discussion about it as it explained some wrong information
> >in the documentation, but as far as I and gdb can tell, grep
> >2.26 at least call pcre_exec for every line of the inpu
2016-11-19 03:22:23 -0800, Paul Eggert:
> Stephane Chazelas wrote:
>
> >I don't know the details of why it's done that way, but I'm not
> >sure I can see how calling pcre_exec that way can be quicker
> >than calling it on each individual line/record.
>
2016-11-19 16:14:28 +, Stephane Chazelas:
[...]
> AFAICT, that optimisation only brings a little optimisation in
> some cases (and reduces performance in others) but also reduces
> functionality and breaks users' expectations. IMO, it's not
> worth it.
[...]
To be fair
2016-11-19 16:45:29 +, Stephane Chazelas:
> 2016-11-19 16:14:28 +0000, Stephane Chazelas:
> [...]
> > AFAICT, that optimisation only brings a little optimisation in
> > some cases (and reduces performance in others) but also reduces
> > functionality and breaks users&
2016-11-19 15:12:41 -0800, Paul Eggert:
> Aaron Crane wrote:
> >I'm not sure it's ideal to use the Perl documentation as
> >a comprehensive guide to PCRE:
>
> Unfortunately libpcre does not document the regular expression
> syntax it supports. Neither does the grep manual, for 'grep -P'. And
> as
2016-11-20 00:03:26 -0800, Paul Eggert:
> As all the bugs in this bug report appear to be fixed, I'm closing it now.
Thanks.
BTW, I was wrong about
static char const xprefix[] = "^(?:";
static char const xsuffix[] = ")$";
(as opposed to \A, \z) causing a problem with grep -Pxz '(?m)...'
as
Hello,
In grep 2.26,
echo é | grep '[d-f]'
no longer matches in locales like fr_FR.iso885915@euro or
en_GB.iso88591 where the character set is single-byte like
ISO-8859-1. It still works OK with UTF-8.
2.25 was OK. git bisect points to commit
2769d5331a38d623b67b1860ac46b39ff7e54aca
Reproduce
2016-11-20 21:14:31 +, Stephane Chazelas:
[...]
> echo é | grep '[d-f]'
>
> no longer matches in locales like fr_FR.iso885915@euro or
> en_GB.iso88591 where the character set is single-byte like
> ISO-8859-1. It still works OK with UTF-8.
[...]
I also seems to sti
$ locale charmap
GB18030
$ printf '\uC9\n' | grep '.*7' | hd
81 30 87 37 0a|.0.7.|
0005
U+00C9's encoding does end in the 0x37 byte (7 in ASCII and GB18030).
$ printf '\uC9\n' | grep '.*0'
fails.
$ printf '\uC9\n' | grep -o '.*7'
returns wi
2016-11-20 16:38:29 -0500, Dennis Clarke:
[...]
> On a Solaris 10 system the locales are named a bit different :
> dasoyva_$ locale -a
[...]
> en_US.ISO8859-1
> en_US.ISO8859-15
[...]
> dasoyva_$ LC_ALL=en_US.ISO8859-1 /usr/bin/printf '\351\n' | od -Ax -t x1 -v
> 000 e9 0a
> 002
>
> I am n
2016-11-20 16:38:29 -0500, Dennis Clarke:
> On 11/20/2016 04:14 PM, Stephane Chazelas wrote:
> >printf '\351\n' | LC_ALL=en_US.iso88591
>
> On a Solaris 10 system
[...]
FWIW, on Solaris 11, it looks as if (speculated from very few
tests) GNU grep's ranges ([x-y]
2016-11-20 21:50:28 +, Stephane Chazelas:
> $ locale charmap
> GB18030
> $ printf '\uC9\n' | grep '.*7' | hd
> 81 30 87 37 0a|.0.7.|
> 0005
>
> U+00C9's encoding does end in the 0x37 byte (7
2017-03-06 16:44:01 -0600, Eric Blake:
[...]
> > To reproduce run in empty directory
> >
> > echo "start_string_20091210.end" | grep -o
> > [1-2][0][0-1][0-9][0-1][0-9][0-3][0-9]
>
> Underquoted. This asks the shell to perform globbing for all files that
> match those patterns. Initially, none
2019-07-22 16:16:43 -0600, Assaf Gordon:
[...]
> or the more robust:
>
>printf "%s" "$variable_i'm_looking_in" | grep -q "$thing_i'm_looking_for"
[...]
Preferably (POSIXly):
printf "%s\n" "$variable_i'm_looking_in" | grep -qe "$thing_i'm_looking_for"
Or
printf "%s\n" "$variable_i'm_looking
2019-10-03 09:08:59 -0400, Tom Limoncelli:
[...]
>*-i*, *--ignore-case*
>
> Ignore case distinctions in both the PATTERN and the input
>
> files.
>
>
> Taking off my developer hat and putting on my writer/author hat, I have to
> agree that the man page could
2019-10-15 12:43:14 +1100, Trent W. Buck:
[...]
> > GNU grep already has options for fixed strings (-F),
> > and BRE, ERE and PCRE. Can we have one for glob(7)?
> >
> > AFAICT nobody has asked for this before; this surprises me,
> > because it "feels" like it should be easy to implement.
> >
> > A
2019-11-15 13:38:38 -0800, Paul Eggert:
> On 11/15/19 11:06 AM, NIDE, Naoyuki wrote:
> > echo ba | LANG=ja_JP.eucjp grep -F -w a
> > outputs ba, but should output nothing.
>
> I don't observe this problem with GNU grep 3.3 on Fedora 31. Please try
> upgrading to grep 3.3, the current release. If t
2019-12-29 15:24:47 +0100, Martin Simons:
[...]
> martin@laptop:~/test$ grep 'Jantje*' school.txt
> Which delivers the desired output:
> Jantje
It also matches on Jantj? The * regexp operator matches 0 or
more of the preceding atom. So e* matches 0 or more "e"s. It's
not to confused with the "*" s
2019-12-29 10:46:10 -0800, Paul Eggert:
> On 12/29/19 6:24 AM, Martin Simons wrote:
> > It may not be the task of the grep project to provide a man page, but even
> > then I
> > feel there is an opportunity for improvement here
>
> Right on both counts. I installed the attached patches to improve
2019-12-29 18:59:22 -0800, Paul Eggert:
[...]
> > It should be
> >
> > grep -n -- 'f.*\.c$' *g*.h /dev/null
>
> Thanks, done in the attached patch.
[...]
There were a few other instances. The patch below attempts to
fix those (includes your patch).
I've also replaced find|xargs to the standard
2019-12-30 01:59:09 -0800, Paul Eggert:
> Thanks, I installed the attached, which is a bit more conservative than what
> you
> suggested but which I hope catches all the issues.
[...]
> @example
> -grep -l 'main' *.c
> +grep -l 'main' test-*.c
> @end example
[...]
Thanks,
though I find it's a
2020-05-01 19:05:28 +0200, radisso...@web.de:
[...]
> problem: grep for a character where only the hexcode in known.
>
> solution:use $'\xNN'
> then shell expands this to the required code
>
> example: printf "A\nB\nC\n" | grep $'\x41'
[...]
The $'\x41' ksh93 q
40 matches
Mail list logo