Ah sorry, I did indeed rebuild grep with --with-included-regex, and for debugging purposes added CFLAGS="-O0 -g".
In any case, the problematic code is both in glibc and grep, as I believe these are originating from the same source. -Dimitry > On 2 Apr 2023, at 08:52, arn...@skeeve.com wrote: > > Hi. > > Dimitry Andric <dimi...@andric.com> wrote: > >> Yes, it still reproduces when I configure the latest grep using >> --without-included-regex: > > I assume you meant --with-included-regex? > > If you really used --without-included-regex, that doesn't prove anything... > :-) > > It's interesting, as gawk uses the same regex, but with different flags. > It might be worth trying to understand which of the syntax bits > is causing this. > > Thanks, > > Arnold > >> >> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) >> 1: idx = 0 >> (gdb) >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, >> nmatch); >> 1: idx = 0 >> (gdb) >> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) >> 1: idx = 0 >> (gdb) >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) >> 1: idx = 0 >> (gdb) >> 1428 cur_node = proceed_next_node (mctx, nmatch, pmatch, >> prev_idx_match, >> 1: idx = 0 >> (gdb) >> 1432 if (__glibc_unlikely (cur_node < 0)) >> 1: idx = 0 >> (gdb) >> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) >> 1: idx = 0 >> (gdb) >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, >> nmatch); >> 1: idx = 0 >> (gdb) >> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) >> 1: idx = 0 >> (gdb) >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) >> 1: idx = 0 >> >> The endless loop looks the same. grep's regexec.c is mostly the same as >> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N >> directives added. >> >> -Dimitry >> >>> On 28 Mar 2023, at 08:46, arn...@skeeve.com wrote: >>> >>> This does not reproduce with gawk, even when I force use of the regex >>> matcher. >>> >>> What happens if grep is built with the included regex files instead of >>> relying on the ones in the local glibc? >>> >>> Arnold >>> >>> Dimitry Andric <dimi...@andric.com> wrote: >>> >>>> On 27 Mar 2023, at 11:14, Koen Claessen <k...@chalmers.se> wrote: >>>>> >>>>> Running the command: >>>>> >>>>> echo a | grep -E -w '((()|a)|())*' >>>>> >>>>> does not terminate, and uses a LOT of processor time, for all versions of >>>>> grep I have tried. >>>>> >>>>> This is the smallest case that could be found; simplifying anything in the >>>>> input and/or expression leads to correct behavior. >>>> >>>> Reproducible with GNU grep 3.7 on Ubuntu 22: >>>> >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >>>> COMMAND >>>> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep >>>> -E -w ((()|a)|())* >>>> >>>> It seems that at least on Ubuntu, grep in this mode uses glibc's >>>> regexec(3), and it is that implementation which ends up in an endless loop. >>>> >>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node never >>>> change from 0: >>>> >>>> 1378 static reg_errcode_t >>>> 1379 __attribute_warn_unused_result__ >>>> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, >>>> size_t nmatch, >>>> 1381 regmatch_t *pmatch, bool fl_backtrack) >>>> 1382 { >>>> ... >>>> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) >>>> 1416 { >>>> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, >>>> nmatch); >>>> 1418 >>>> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) >>>> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) >>>> ... >>>> 1442 /* Proceed to next node. */ >>>> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, >>>> prev_idx_match, >>>> 1444 &idx, cur_node, >>>> 1445 &eps_via_nodes, fs); >>>> 1446 >>>> 1447 if (__glibc_unlikely (cur_node < 0)) >>>> ... >>>> 1465 } >>>> 1466 } >>>> >>>> -Dimitry >>>> >>>> P.S.: Interestingly this does not reproduce with BSD grep, which returns >>>> immediately with "a". >>>> >>
signature.asc
Description: Message signed with OpenPGP