On 27 Mar 2023, at 11:14, Koen Claessen <k...@chalmers.se> wrote: > > Running the command: > > echo a | grep -E -w '((()|a)|())*' > > does not terminate, and uses a LOT of processor time, for all versions of > grep I have tried. > > This is the smallest case that could be found; simplifying anything in the > input and/or expression leads to correct behavior.
Reproducible with GNU grep 3.7 on Ubuntu 22: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E -w ((()|a)|())* It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop. It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0: 1378 static reg_errcode_t 1379 __attribute_warn_unused_result__ 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch, 1381 regmatch_t *pmatch, bool fl_backtrack) 1382 { ... 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) 1416 { 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); 1418 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) ... 1442 /* Proceed to next node. */ 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match, 1444 &idx, cur_node, 1445 &eps_via_nodes, fs); 1446 1447 if (__glibc_unlikely (cur_node < 0)) ... 1465 } 1466 } -Dimitry P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
signature.asc
Description: Message signed with OpenPGP