https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78276

--- Comment #3 from James K. Lowden <jklowden at schemamania dot org> ---
Here is a nonpathological example taken from a real-world problem were
std::regex_search fails.  

This pattern is part of the COBOL COPY text-manipulation directive: 

([[:space:]]+(LEADING|TRAILING))?[[:space:]]+("((["]{2}|[^"])*)"|'(([']{2}|[^'])*)[']|([[:alnum:]]+([_-]+[[:alnum:]]+)*)|==((=?[^=]+)+)==)[[:space:]]+BY[[:space:]]+(("(["]{2}|[^"])*")|('([']{2}|[^'])*')|([[:alnum:]]+([_-]+[[:alnum:]]+)*)|==((=?[^=]+)*)==)([[:space:]]*[.])?

That pattern has 21 captures.  Ignoring the optional LEADING/TRAILING clause,
it accepts 1 of 3 operands on either side of the BY keyword: 

1.  a quoted string using the " double-quote
2.  a quoted string using the ' single-quote
3.  an identifier consisting of alphanumerics with hyphens or underscores

Quoted strings in this syntax may include embedded quotes by doubling them. 

By "fails", I mean "does not terminate" in a reasonable time.  Using gdb I have
seen over 1900 stack frames inside std::regex_search.  This is with gcc 11 on
Linux.  

I have recast the program using awk and regex(3) from the C standard library,
both of which succeed instantly.  I attach a tarball that includes all three
files, the input, and a Makefile to demonstrate them.

Reply via email to