bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate

2023-04-03 Thread Koen Claessen
I found it when I was testing various new regular expression algorithms against grep (which I used as the golden standard for this). I used a random generator for regular expressions (using the QuickCheck framework) and then shrinking/delta debugging to automatically find the smallest failing test

bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate

2023-04-03 Thread Paul Eggert
On 2023-04-03 05:07, Koen Claessen wrote: BTW, if you are interested, I could do a larger more targeted effort stress testing grep like this and possibly find more test cases with unexpected behavior. I would need some guidance on where to put most effort in order to be as useful as this can be.

bug#62647: [INSTALL] grep: re-fix Y2038 bug on glibc 2.34+ x86, ARM

2023-04-03 Thread Paul Eggert
The meaning of AC_SYS_LARGEFILE has changed to no longer even try to use wider time_t if available. So use AC_SYS_YEAR2038 as well. A more-aggressive change would be to use the next Autoconf’s AC_SYS_YEAR2038_REQUIRED but at least let’s restore the grep 3.8 behavior. * NEWS: Mention this. * bootst

bug#62647: [INSTALL] grep: re-fix Y2038 bug on glibc 2.34+ x86, ARM

2023-04-03 Thread Jim Meyering
On Mon, Apr 3, 2023 at 10:34 AM Paul Eggert wrote: > The meaning of AC_SYS_LARGEFILE has changed to no longer even try > to use wider time_t if available. So use AC_SYS_YEAR2038 as well. > A more-aggressive change would be to use the next Autoconf’s > AC_SYS_YEAR2038_REQUIRED but at least let’s r

bug#62647: [INSTALL] grep: re-fix Y2038 bug on glibc 2.34+ x86, ARM

2023-04-03 Thread Paul Eggert
On 2023-04-03 10:52, Jim Meyering wrote: I wanted to see how this would make grep fail, but don't have convenient access to such hosts. Would this trigger the failure? touch -t 20390101 f grep ^ f Yes, that triggers it. Of course one needs a "touch" and a filesystem that supports su

bug#60690: -P '\d' in GNU and git grep

2023-04-03 Thread Paul Eggert
I've recently done some bug-report maintenance about a set of GNU grep bug reports related to whether whether "grep -P '\d'" should match non-ASCII digits, and have some thoughts about coordinating GNU grep with git grep in this department. GNU Bug#62605[1] "`[\d]` does not work with PCRE" has

bug#62657: PCRE2-related workarounds that GNU grep might need

2023-04-03 Thread Paul Eggert
Recent commits in Git do the following to work around bugs in PCRE2. Quite possibly GNU grep -P should do the same, when in a UTF-8 locale. * Disable PCRE2_UCP unless PCRE2 10.35 or higher. * If ignoring case and PCRE2_MATCH_INVALID_UTF is defined, then enable PCRE2_NO_START_OPTIMIZE unles

bug#62647: [INSTALL] grep: re-fix Y2038 bug on glibc 2.34+ x86, ARM

2023-04-03 Thread Jim Meyering
On Mon, Apr 3, 2023 at 11:20 AM Paul Eggert wrote: > On 2023-04-03 10:52, Jim Meyering wrote: > > I wanted to see how this would make grep fail, but don't > > have convenient access to such hosts. Would this trigger the failure? > > > >touch -t 20390101 f > >grep ^ f > > Yes, that trig

bug#60690: -P '\d' in GNU and git grep

2023-04-03 Thread Jim Meyering
On Mon, Apr 3, 2023 at 2:39 PM Paul Eggert wrote: > I've recently done some bug-report maintenance about a set of GNU grep > bug reports related to whether whether "grep -P '\d'" should match > non-ASCII digits, and have some thoughts about coordinating GNU grep > with git grep in this department.

bug#62657: PCRE2-related workarounds that GNU grep might need

2023-04-03 Thread Carlo Arenas
On Mon, Apr 3, 2023 at 2:50 PM Paul Eggert wrote: > >* Disable PCRE2_UCP unless PCRE2 10.35 or higher. this is because of a bug in JIT, alternatively JIT could be disabled >* If ignoring case and PCRE2_MATCH_INVALID_UTF is defined, then > enable PCRE2_NO_START_OPTIMIZE unless PCRE2 10.36

bug#62657: PCRE2-related workarounds that GNU grep might need

2023-04-03 Thread Paul Eggert
On 2023-04-03 23:17, Carlo Arenas wrote: On Mon, Apr 3, 2023 at 2:50 PM Paul Eggert wrote: * Disable PCRE2_UCP unless PCRE2 10.35 or higher. this is because of a bug in JIT, alternatively JIT could be disabled Oh, that might be better as it doesn't affect behavior (just performance).

bug#62647: [INSTALL] grep: re-fix Y2038 bug on glibc 2.34+ x86, ARM

2023-04-03 Thread arnold
Jim Meyering wrote: > Thanks, Paul. > I wanted to see how this would make grep fail, but don't > have convenient access to such hosts. Would this trigger the failure? > > touch -t 20390101 f > grep ^ f > > How does it fail? Why in the world does grep even care about timestamps on files?

bug#62647: [INSTALL] grep: re-fix Y2038 bug on glibc 2.34+ x86, ARM

2023-04-03 Thread Paul Eggert
On 2023-04-03 23:35, arn...@skeeve.com wrote: Why in the world does grep even care about timestamps on files? The same reason 'awk' does. grep calls stat, fstat, etc., and the call fails with errno == EOVERFLOW if the file's timestamp is past the year 2038. For the same reason I expect that

bug#62647: [INSTALL] grep: re-fix Y2038 bug on glibc 2.34+ x86, ARM

2023-04-03 Thread arnold
Paul Eggert wrote: > On 2023-04-03 23:35, arn...@skeeve.com wrote: > > Why in the world does grep even care about timestamps on files? > > The same reason 'awk' does. grep calls stat, fstat, etc., and the call > fails with errno == EOVERFLOW if the file's timestamp is past the year 2038. > > For

bug#60690: -P '\d' in GNU and git grep

2023-04-03 Thread Paul Eggert
On 2023-04-03 20:30, Jim Meyering wrote: have you seen justification (other than for compatibility with some other tool or language) for allowing \d to match non-ASCII by default, in spite of the risks? In the example Ævar supplied in , my impression was that it was

bug#60690: -P '\d' in GNU and git grep

2023-04-03 Thread Carlo Arenas
On Mon, Apr 3, 2023 at 2:38 PM Paul Eggert wrote: > > In researching this a bit further, I found that on March 23 Git disabled > the use of PCRE2_UCP in PCRE2 10.34 or earlier[6], due to a PCRE2 bug > that can cause a crash when PCRE2_UCP is used[7]. A bug fix[8] should > appear in the next PCRE2