On Thu, Oct 8, 2020 at 2:41 AM Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > > We can set RE_NO_SUB for calling regex only to check syntax. It brings > performance gains in cases to have a lot of enormous epsilon nodes. > > > $ printf '(%020000d)\n' | sed 's/0/|/g' >pat > > (before) > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null > real 6.15 > user 4.62 > sys 1.52 > > (after) > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null > real 0.66 > user 0.19 > sys 0.46
Thank you. FYI, when running similar commands with and without your patch (with an eye to adding a test), I ran this one (with your patch). It shows that using 80,000 terms caused grep to consume 32GB of memory before being OOM-killed: $ printf '(%080000d)\n' | sed 's/0/|/g' | env time src/grep -Ef- /dev/null Command terminated by signal 9 6.42user 19.98system 0:57.91elapsed 45%CPU (0avgtext+0avgdata 32024460maxresident)k 6504inputs+0outputs (92major+12003644minor)pagefaults 0swaps [Exit 137 (KILL)] I will come back to this later this week.