This does not reproduce with gawk, even when I force use of the regex matcher.
What happens if grep is built with the included regex files instead of relying on the ones in the local glibc? Arnold Dimitry Andric <dimi...@andric.com> wrote: > On 27 Mar 2023, at 11:14, Koen Claessen <k...@chalmers.se> wrote: > > > > Running the command: > > > > echo a | grep -E -w '((()|a)|())*' > > > > does not terminate, and uses a LOT of processor time, for all versions of > > grep I have tried. > > > > This is the smallest case that could be found; simplifying anything in the > > input and/or expression leads to correct behavior. > > Reproducible with GNU grep 3.7 on Ubuntu 22: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep > -E -w ((()|a)|())* > > It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), > and it is that implementation which ends up in an endless loop. > > It loops between lines 1415, 1417 and 1443, but idx and cur_node never change > from 0: > > 1378 static reg_errcode_t > 1379 __attribute_warn_unused_result__ > 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t > nmatch, > 1381 regmatch_t *pmatch, bool fl_backtrack) > 1382 { > ... > 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > 1416 { > 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, > nmatch); > 1418 > 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > ... > 1442 /* Proceed to next node. */ > 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, > prev_idx_match, > 1444 &idx, cur_node, > 1445 &eps_via_nodes, fs); > 1446 > 1447 if (__glibc_unlikely (cur_node < 0)) > ... > 1465 } > 1466 } > > -Dimitry > > P.S.: Interestingly this does not reproduce with BSD grep, which returns > immediately with "a". >