https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78276
--- Comment #3 from James K. Lowden <jklowden at schemamania dot org> ---
Here is a nonpathological example taken from a real-world problem were
std::regex_search fails.
This pattern is part of the COBOL COPY text-manipulation directive:
([[:space:]]+(LEADING|TRAILING))?[[:space:]]+("((["]{2}|[^"])*)"|'(([']{2}|[^'])*)[']|([[:alnum:]]+([_-]+[[:alnum:]]+)*)|==((=?[^=]+)+)==)[[:space:]]+BY[[:space:]]+(("(["]{2}|[^"])*")|('([']{2}|[^'])*')|([[:alnum:]]+([_-]+[[:alnum:]]+)*)|==((=?[^=]+)*)==)([[:space:]]*[.])?
That pattern has 21 captures. Ignoring the optional LEADING/TRAILING clause,
it accepts 1 of 3 operands on either side of the BY keyword:
1. a quoted string using the " double-quote
2. a quoted string using the ' single-quote
3. an identifier consisting of alphanumerics with hyphens or underscores
Quoted strings in this syntax may include embedded quotes by doubling them.
By "fails", I mean "does not terminate" in a reasonable time. Using gdb I have
seen over 1900 stack frames inside std::regex_search. This is with gcc 11 on
Linux.
I have recast the program using awk and regex(3) from the C standard library,
both of which succeed instantly. I attach a tarball that includes all three
files, the input, and a Makefile to demonstrate them.