bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Paul Eggert
Stephane Chazelas wrote: Why would it make it slower. AFAICT, PCRE_MULTILINE *adds* some overhead. After looking into it I now remember why PCRE_MULTILINE speeds things up. See: http://git.savannah.gnu.org/cgit/grep.git/commit/?id=f6603c4e1e04dbb87a7232c4b44acc6afdf65fef where using PCRE_MULT

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Paul Eggert
Stephane Chazelas wrote: one can use (?m) if he wants ^ to match the beginning of each line in the NUL-delimited record instead of just the beginning of the record. I think the intent is that ^ and $ should match only the line-terminator specified by -z (or by -z's absence). So the sort of usa

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Zev Weiss
On Sat, Nov 19, 2016 at 12:36:12AM -0800, Paul Eggert wrote: Stephane Chazelas wrote: one can use (?m) if he wants ^ to match the beginning of each line in the NUL-delimited record instead of just the beginning of the record. I think the intent is that ^ and $ should match only the line-termi

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Paul Eggert
Zev Weiss wrote: I see 'reflags' being tested in Pexecute(), but I don't see it getting set anywhere Oops, that somehow got lost during merging. Thanks for catching that. Omitting the initialization caused hurt performance due to a failure to use PCRE_MULTILINE but did not cause a correctnes

bug#24941: Early termination bug in grep 2.26

2016-11-19 Thread Paul Eggert
This turned into more work than I expected, as I kept finding performance glitches and/or correctness bugs in the neighborhood. I installed the attached set of patches. Patch 03 is the crucial one. Patch 10 trivially fixes an earlier test of mine and I'm too lazy to write a separate email for it

bug#24961: input option -z alters behavior of output option -o in an undocumented way

2016-11-19 Thread Paul Eggert
vampyre...@gmail.com wrote: The documentation also states that -z affects only how input is interpreted. I don't see where it does that, as the current documentation for -z talks about both "input and output data". That being said, the manual could be clearer. I installed the attached. From 4

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Stephane Chazelas
2016-11-18 15:37:16 -0800, Paul Eggert: [...] > >That might have been the case a long time ago, as I remember > >some discussion about it as it explained some wrong information > >in the documentation, but as far as I and gdb can tell, grep > >2.26 at least call pcre_exec for every line of the inpu

bug#24451: grep -Tn misbehaves under Emacs shell

2016-11-19 Thread Paul Eggert
No further comment, and I merged and installed the patches and am closing this bug report.

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Paul Eggert
Stephane Chazelas wrote: I don't know the details of why it's done that way, but I'm not sure I can see how calling pcre_exec that way can be quicker than calling it on each individual line/record. It can be hundreds of times faster in common cases. See: http://git.savannah.gnu.org/cgit/grep.

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Aaron Crane
Paul Eggert wrote: > Stephane Chazelas wrote: >> Removing PCRE_MULTILINE (and get back to calling pcre_exec on >> every record separately) would help except in the cases where the >> user does: >> >> grep -xzP '(?m)a' > > I don't think grep can address this problem, as in general that would > requ

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Stephane Chazelas
2016-11-19 03:22:23 -0800, Paul Eggert: > Stephane Chazelas wrote: > > >I don't know the details of why it's done that way, but I'm not > >sure I can see how calling pcre_exec that way can be quicker > >than calling it on each individual line/record. > > It can be hundreds of times faster in comm

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Stephane Chazelas
2016-11-19 16:14:28 +, Stephane Chazelas: [...] > AFAICT, that optimisation only brings a little optimisation in > some cases (and reduces performance in others) but also reduces > functionality and breaks users' expectations. IMO, it's not > worth it. [...] To be fair, that last part about us

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Stephane Chazelas
2016-11-19 16:45:29 +, Stephane Chazelas: > 2016-11-19 16:14:28 +, Stephane Chazelas: > [...] > > AFAICT, that optimisation only brings a little optimisation in > > some cases (and reduces performance in others) but also reduces > > functionality and breaks users' expectations. IMO, it's no

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Paul Eggert
Aaron Crane wrote: I'm not sure it's ideal to use the Perl documentation as a comprehensive guide to PCRE: Unfortunately libpcre does not document the regular expression syntax it supports. Neither does the grep manual, for 'grep -P'. And as you've mentioned, the Perl manual isn't a reliable

bug#24941: Early termination bug in grep 2.26

2016-11-19 Thread Jim Meyering
On Sat, Nov 19, 2016 at 1:47 AM, Paul Eggert wrote: > This turned into more work than I expected, as I kept finding performance > glitches and/or correctness bugs in the neighborhood. I installed the > attached set of patches. Patch 03 is the crucial one. Patch 10 trivially > fixes an earlier test

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Stephane Chazelas
2016-11-19 15:12:41 -0800, Paul Eggert: > Aaron Crane wrote: > >I'm not sure it's ideal to use the Perl documentation as > >a comprehensive guide to PCRE: > > Unfortunately libpcre does not document the regular expression > syntax it supports. Neither does the grep manual, for 'grep -P'. And > as

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Paul Eggert
Stephane Chazelas wrote: You've missed "man pcrepattern". Yes I did. Thanks. Google was not my friend there.

bug#22655: grep -Pz '^' now fails!

2016-11-19 Thread Paul Eggert
Stephane Chazelas wrote: I don't find a x220 factor, more like a x2.5 factor: I think I found the factor-of-hundreds slowdown, and fixed it in the 2nd attached patch. When I tried your benchmark with pcregrep (pcre 8.39, configured with --enable-unicode-properties), and with ./grep0 (which