On Mon, Oct 12, 2020 at 4:08 PM Jim Meyering <j...@meyering.net> wrote: > On Thu, Oct 8, 2020 at 2:41 AM Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > > > > We can set RE_NO_SUB for calling regex only to check syntax. It brings > > performance gains in cases to have a lot of enormous epsilon nodes. > > > > > > $ printf '(%020000d)\n' | sed 's/0/|/g' >pat > > > > (before) > > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null > > real 6.15 > > user 4.62 > > sys 1.52 > > > > (after) > > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null > > real 0.66 > > user 0.19 > > sys 0.46 > > Thank you. > > FYI, when running similar commands with and without your patch (with > an eye to adding a test), I ran this one (with your patch). It shows > that using 80,000 terms caused grep to consume 32GB of memory before > being OOM-killed: > > $ printf '(%080000d)\n' | sed 's/0/|/g' | env time src/grep -Ef- /dev/null > Command terminated by signal 9 > 6.42user 19.98system 0:57.91elapsed 45%CPU (0avgtext+0avgdata > 32024460maxresident)k > 6504inputs+0outputs (92major+12003644minor)pagefaults 0swaps > [Exit 137 (KILL)] > > I will come back to this later this week.
We must accept the fact that extreme regular expressions will cause resource exhaustion like that when processed by classical regex_* functions. This is yet another good reason to prefer PCRE and to use grep's -P option. In that case, it fails like this: $ printf '(%080000d)\n' | sed 's/0/|/g' |grep -Pf- /dev/null grep: regular expression is too large I have just pushed your patch, but without adding a test.