bug#16865: grep -wP and backreferences

2014-02-24 Thread Stephane Chazelas
Hello, Backreferences don't work with -w or -x in combination with -P: $ echo aa | grep -Pw '(.)\1' $ Or they work in an unexpected way: $ echo aa | grep -Pw '(.)\2' aa The fix is simple: --- src/pcresearch.c~ 2014-02-24 09:59:56.864374362 + +++ src/pcresearch.c2014-02-24 07:33:04.

bug#16865: grep -wP and backreferences

2014-02-24 Thread Stephane Chazelas
...\b instead of \b(?:...)\b, so it could be that those brackets are not necessary in the first place. Sorry I lied, it was not the last note ;-). Note the difference: $ echo a@@b | grep -w @@ $ echo a@@b | grep -Pw @@ a@@b Maybe instead of \b(?:...)\b, we could use (? On Mon, Feb 24,

bug#16867: [bug #37600] grep -w cuts words on non-ascii

2014-02-24 Thread Stephane Chazelas
2014-02-24 08:53:17 -0800, Jim Meyering: [...] > This is pretty serious: > > $ printf 'p\xc3\xa8re\n' |LC_ALL=en_US.utf8 grep -w p > père I gets more complicated with combining characters: $ printf 'pe\314\200re\n' | grep -w pe père You can't expect \w to match U+0300 alone. You can't e

bug#16871: problems about matching newline (with -z)

2014-02-24 Thread Stephane Chazelas
The doc has a confusing statement: > 15. How can I match across lines? > >Standard grep cannot do this, as it is fundamentally line-based. >Therefore, merely using the '[:space:]' character class does not >match newlines in the way you might expect. However, if your grep >is compi

bug#16871: Acknowledgement (problems about matching newline (with -z))

2014-02-25 Thread Stephane Chazelas
Also: $ printf 'a\nb\0' | grep -z 'a$' $ printf 'a\nb\0' | grep -zP 'a$' a b $ printf 'a\nb\0' | grep -zxP a a b Why use PCRE_MULTILINE here?

bug#16865: grep -wP and backreferences

2014-02-25 Thread Stephane Chazelas
2014-02-24 20:55:42 -0800, Jim Meyering: > On Mon, Feb 24, 2014 at 1:20 PM, Stephane Chazelas > wrote: > > A last note: with -w, pcregrep wraps the regexp in \b...\b > > instead of \b(?:...)\b, so it could be that those brackets are > > not necessary in the first place. T

bug#16865: grep -wP and backreferences

2014-02-25 Thread Stephane Chazelas
2014-02-25 10:03:28 -0800, Jim Meyering: [...] > I've uncapitalized the 1-line summary, changed a That to This > in the log, and added examples to NEWS, and added an empty > line to restore the 2-empty-line section delimiter. [...] > I've closed this ticket, and will push once you ack these changes

bug#21865: Parenthesis subexpressions

2015-11-09 Thread Stephane Chazelas
2015-11-08 21:49:03 +0100, Valerio Bozzolan: > Sorry... typo... > > echo abcde | grep -o -E 'b([a-z])d' > => "bcd" > > Can't I choose to have only "c"? [...] That's correct, GNU grep doesn't have that capability (yet). Recent versions of pcregrep do: $ echo abc | pcregrep -o1 '.(.).' b

bug#21865: Parenthesis subexpressions

2015-11-09 Thread Stephane Chazelas
2015-11-09 17:00:36 +0100, Valerio Bozzolan: > Thanks for agreeing with the evolution of the meaning of "-o". > > Just to make you a laugh: I was reproducing egrep with $BASH_REMATCH: > https://gist.github.com/valerio-bozzolan/6787675e931dce1ba7e9 > > Definitely not beautiful... but really effect

bug#21989: grep search by ASCII code unsuccessful

2015-11-23 Thread Stephane Chazelas
2015-11-22 21:24:05 -0800, Shivanshu Goyal: [...] > I think I found a bug which did not exist in version 2.14, but does seem to > exist in versions 2.16 and 2.22. I have not tested any other versions. > > Say there is a file with the following contents: > > shivanshu@thetis:tmp$ cat temp | xxd >

bug#22655: grep -Pz '^' now fails!

2016-11-18 Thread Stephane Chazelas
Hello, I'm just finding out that ^ and $ no longer work with grep -Pz: https://unix.stackexchange.com/questions/324263/grep-command-doesnt-support-and-anchors-when-its-with-pz $ grep -Pz '^' grep: unescaped ^ or $ not supported with -Pz Which points to this bug. Note that, it's not that pcre do

bug#22655: grep -Pz '^' now fails!

2016-11-18 Thread Stephane Chazelas
2016-11-18 07:54:29 -0800, Paul Eggert: > Stephane Chazelas wrote: > >Can that bug please be reopened so it can be addressed > >differenly (PCRE_MULTILINE removed, PCRE_DOLLAR_ENDONLY added)? > > Removing PCRE_MULTILINE will make 'grep' wy slower. Can you >

bug#22655: grep -Pz '^' now fails!

2016-11-18 Thread Stephane Chazelas
2016-11-18 16:24:02 +, Stephane Chazelas: [...] > Why would it make it slower. AFAICT, PCRE_MULTILINE *adds* > some overhead. It is really meant to be only about changing the > behaviour of ^ and $. [...] Unsurprisingly, where ~/a contains the output of find / -print0 (which is

bug#22655: grep -Pz '^' now fails!

2016-11-18 Thread Stephane Chazelas
2016-11-18 08:48:04 -0800, Paul Eggert: > Stephane Chazelas wrote: > >Why would it make it slower. AFAICT, PCRE_MULTILINE *adds* > >some overhead. > > As I understand it, PCRE_MULTILINE lets 'grep' apply a pattern to an > entire buffer that contains many lines

bug#22655: grep -Pz '^' now fails!

2016-11-18 Thread Stephane Chazelas
2016-11-18 17:06:36 +, Stephane Chazelas: [...] > That might have been the case a long time ago, as I remember > some discussion about it as it explained some wrong information > in the documentation, but as far as I and gdb can tell, grep > 2.26 at least call pcre_exec for every

bug#16871: doc/test confusions with grep -P

2016-11-18 Thread Stephane Chazelas
For the record, the doc/test confusion was fixed by commit b73296ace186451b096b075461634c153d1fa525 http://git.savannah.gnu.org/cgit/grep.git/commit/?id=b73296ace186451b096b075461634c153d1fa525 See also https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22655#47 and below about PCRE_MULTILINE.

bug#22655: grep -Pz '^' now fails!

2016-11-18 Thread Stephane Chazelas
2016-11-18 17:32:59 +, Stephane Chazelas: [...] > And PCRE_MULTILINE now can only improve performance. [...] Sorry, I meant "and *removing* PCRE_MULTILINE now can only improve performance". (and we need to add PCRE_DOLLAR_ENDONLY or otherwise printf 'a\n\0' | grep -Pz 'a$' would still match).

bug#22655: grep -Pz '^' now fails!

2016-11-18 Thread Stephane Chazelas
2016-11-18 09:47:50 -0800, Paul Eggert: > Stephane Chazelas wrote: > >$ time grep -Pz '(?-m)^/' ~/a > /dev/null > > It looks like you want "^" to stand for a newline character, not the > start of a line. That is not how grep -z works. -z causes the

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Stephane Chazelas
2016-11-18 15:37:16 -0800, Paul Eggert: [...] > >That might have been the case a long time ago, as I remember > >some discussion about it as it explained some wrong information > >in the documentation, but as far as I and gdb can tell, grep > >2.26 at least call pcre_exec for every line of the inpu

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Stephane Chazelas
2016-11-19 03:22:23 -0800, Paul Eggert: > Stephane Chazelas wrote: > > >I don't know the details of why it's done that way, but I'm not > >sure I can see how calling pcre_exec that way can be quicker > >than calling it on each individual line/record. >

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Stephane Chazelas
2016-11-19 16:14:28 +, Stephane Chazelas: [...] > AFAICT, that optimisation only brings a little optimisation in > some cases (and reduces performance in others) but also reduces > functionality and breaks users' expectations. IMO, it's not > worth it. [...] To be fair

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Stephane Chazelas
2016-11-19 16:45:29 +, Stephane Chazelas: > 2016-11-19 16:14:28 +0000, Stephane Chazelas: > [...] > > AFAICT, that optimisation only brings a little optimisation in > > some cases (and reduces performance in others) but also reduces > > functionality and breaks users&

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Stephane Chazelas
2016-11-19 15:12:41 -0800, Paul Eggert: > Aaron Crane wrote: > >I'm not sure it's ideal to use the Perl documentation as > >a comprehensive guide to PCRE: > > Unfortunately libpcre does not document the regular expression > syntax it supports. Neither does the grep manual, for 'grep -P'. And > as

bug#22655: grep -Pz '^' now fails!

2016-11-20 Thread Stephane Chazelas
2016-11-20 00:03:26 -0800, Paul Eggert: > As all the bugs in this bug report appear to be fixed, I'm closing it now. Thanks. BTW, I was wrong about static char const xprefix[] = "^(?:"; static char const xsuffix[] = ")$"; (as opposed to \A, \z) causing a problem with grep -Pxz '(?m)...' as

bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales

2016-11-20 Thread Stephane Chazelas
Hello, In grep 2.26, echo é | grep '[d-f]' no longer matches in locales like fr_FR.iso885915@euro or en_GB.iso88591 where the character set is single-byte like ISO-8859-1. It still works OK with UTF-8. 2.25 was OK. git bisect points to commit 2769d5331a38d623b67b1860ac46b39ff7e54aca Reproduce

bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales

2016-11-20 Thread Stephane Chazelas
2016-11-20 21:14:31 +, Stephane Chazelas: [...] > echo é | grep '[d-f]' > > no longer matches in locales like fr_FR.iso885915@euro or > en_GB.iso88591 where the character set is single-byte like > ISO-8859-1. It still works OK with UTF-8. [...] I also seems to sti

bug#24975: Matching issues with characters whose encoding ends in some other character

2016-11-20 Thread Stephane Chazelas
$ locale charmap GB18030 $ printf '\uC9\n' | grep '.*7' | hd 81 30 87 37 0a|.0.7.| 0005 U+00C9's encoding does end in the 0x37 byte (7 in ASCII and GB18030). $ printf '\uC9\n' | grep '.*0' fails. $ printf '\uC9\n' | grep -o '.*7' returns wi

bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales

2016-11-20 Thread Stephane CHAZELAS
2016-11-20 16:38:29 -0500, Dennis Clarke: [...] > On a Solaris 10 system the locales are named a bit different : > dasoyva_$ locale -a [...] > en_US.ISO8859-1 > en_US.ISO8859-15 [...] > dasoyva_$ LC_ALL=en_US.ISO8859-1 /usr/bin/printf '\351\n' | od -Ax -t x1 -v > 000 e9 0a > 002 > > I am n

bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales

2016-11-20 Thread Stephane CHAZELAS
2016-11-20 16:38:29 -0500, Dennis Clarke: > On 11/20/2016 04:14 PM, Stephane Chazelas wrote: > >printf '\351\n' | LC_ALL=en_US.iso88591 > > On a Solaris 10 system [...] FWIW, on Solaris 11, it looks as if (speculated from very few tests) GNU grep's ranges ([x-y]

bug#24975: Matching issues with characters whose encoding ends in some other character

2016-11-20 Thread Stephane Chazelas
2016-11-20 21:50:28 +, Stephane Chazelas: > $ locale charmap > GB18030 > $ printf '\uC9\n' | grep '.*7' | hd > 81 30 87 37 0a|.0.7.| > 0005 > > U+00C9's encoding does end in the 0x37 byte (7

bug#26005: bug in grep

2017-03-07 Thread Stephane Chazelas
2017-03-06 16:44:01 -0600, Eric Blake: [...] > > To reproduce run in empty directory > > > > echo "start_string_20091210.end" | grep -o > > [1-2][0][0-1][0-9][0-1][0-9][0-3][0-9] > > Underquoted. This asks the shell to perform globbing for all files that > match those patterns. Initially, none

bug#36764: grep -q throwing up No such file or directory error

2019-07-23 Thread Stephane Chazelas
2019-07-22 16:16:43 -0600, Assaf Gordon: [...] > or the more robust: > >printf "%s" "$variable_i'm_looking_in" | grep -q "$thing_i'm_looking_for" [...] Preferably (POSIXly): printf "%s\n" "$variable_i'm_looking_in" | grep -qe "$thing_i'm_looking_for" Or printf "%s\n" "$variable_i'm_looking

bug#37604: man page description of -i confused someone

2019-10-04 Thread Stephane Chazelas
2019-10-03 09:08:59 -0400, Tom Limoncelli: [...] >*-i*, *--ignore-case* > > Ignore case distinctions in both the PATTERN and the input > > files. > > > Taking off my developer hat and putting on my writer/author hat, I have to > agree that the man page could

bug#37753: wish for glob(7)-like matcher

2019-11-13 Thread Stephane Chazelas
2019-10-15 12:43:14 +1100, Trent W. Buck: [...] > > GNU grep already has options for fixed strings (-F), > > and BRE, ERE and PCRE. Can we have one for glob(7)? > > > > AFAICT nobody has asked for this before; this surprises me, > > because it "feels" like it should be easy to implement. > > > > A

bug#38223: grep >=2.28 cannot handle -wF correctly under LANG=ja_JP.eucjp

2019-11-15 Thread Stephane Chazelas
2019-11-15 13:38:38 -0800, Paul Eggert: > On 11/15/19 11:06 AM, NIDE, Naoyuki wrote: > > echo ba | LANG=ja_JP.eucjp grep -F -w a > > outputs ba, but should output nothing. > > I don't observe this problem with GNU grep 3.3 on Fedora 31. Please try > upgrading to grep 3.3, the current release. If t

bug#38792: man grep

2019-12-29 Thread Stephane Chazelas
2019-12-29 15:24:47 +0100, Martin Simons: [...] > martin@laptop:~/test$ grep 'Jantje*' school.txt > Which delivers the desired output: > Jantje It also matches on Jantj? The * regexp operator matches 0 or more of the preceding atom. So e* matches 0 or more "e"s. It's not to confused with the "*" s

bug#38792: man grep

2019-12-29 Thread Stephane Chazelas
2019-12-29 10:46:10 -0800, Paul Eggert: > On 12/29/19 6:24 AM, Martin Simons wrote: > > It may not be the task of the grep project to provide a man page, but even > > then I > > feel there is an opportunity for improvement here > > Right on both counts. I installed the attached patches to improve

bug#38792: man grep

2019-12-30 Thread Stephane Chazelas
2019-12-29 18:59:22 -0800, Paul Eggert: [...] > > It should be > > > > grep -n -- 'f.*\.c$' *g*.h /dev/null > > Thanks, done in the attached patch. [...] There were a few other instances. The patch below attempts to fix those (includes your patch). I've also replaced find|xargs to the standard

bug#38792: man grep

2019-12-30 Thread Stephane Chazelas
2019-12-30 01:59:09 -0800, Paul Eggert: > Thanks, I installed the attached, which is a bit more conservative than what > you > suggested but which I hope catches all the issues. [...] > @example > -grep -l 'main' *.c > +grep -l 'main' test-*.c > @end example [...] Thanks, though I find it's a

bug#41004: Documentation:enhancement - search for hexvalue

2020-05-10 Thread Stephane Chazelas
2020-05-01 19:05:28 +0200, radisso...@web.de: [...] > problem: grep for a character where only the hexcode in known. > > solution:use $'\xNN' > then shell expands this to the required code > > example: printf "A\nB\nC\n" | grep $'\x41' [...] The $'\x41' ksh93 q