I found it when I was testing various new regular expression algorithms against grep (which I used as the golden standard for this).
I used a random generator for regular expressions (using the QuickCheck framework) and then shrinking/delta debugging to automatically find the smallest failing test case. BTW, if you are interested, I could do a larger more targeted effort stress testing grep like this and possibly find more test cases with unexpected behavior. I would need some guidance on where to put most effort in order to be as useful as this can be. I could find a MSc student to help out with that. Let me know if this sounds like an interesting thing to do! kind regards, /Koen On Sun, Apr 2, 2023 at 6:15 AM Jim Meyering <j...@meyering.net> wrote: > On Mon, Mar 27, 2023 at 6:15 AM Koen Claessen <k...@chalmers.se> wrote: > > Running the command: > > > > echo a | grep -E -w '((()|a)|())*' > > > > does not terminate, and uses a LOT of processor time, for all versions of > > grep I have tried. > > > > This is the smallest case that could be found; simplifying anything in > the > > input and/or expression leads to correct behavior. > > Thank you! How did you find that? > > FYI, this strikes grep-3.10 (on Fedora 37/glibc-2.36-9.fc37.x86_64) > when using LC_ALL=en_US.UTF-8, but not with LC_ALL=C. > I.e., this infloops: > echo a | LC_ALL=en_US.UTF-8 grep -E -w '((()|a)|())*' > > but this works as expected and promptly prints its line of input: > echo a | LC_ALL=C grep -E -w '((()|a)|())*' > > For now, I've added an expected-failing test case for this bug: >