tags 26193 moreinfo done On Mon, Mar 20, 2017 at 8:34 AM, John P. Linderman <jpl....@gmail.com> wrote: > In what follows, file "conjectures" is a 6 billion bytes file in which each > line contains at most one letter P, and few (see output) have a digit > following a P. "rusage" is just a home-brew resource usage summary command. > > rusage egrep 'P[0-9]' conjectures > xxx > 695.55 real 688.33 user 2.40 sys 0 pf 186 pr 0 sw 0 rb 8 wb 1 vcx 19206 icx > 2488 mx 0 ix 0 id 0 is > > cat xxx > A[21]=11{11}:22<LP3 > > rusage egrep 'P[[:digit:]]' conjectures > xxx > 14.88 real 13.36 user 1.43 sys 0 pf 186 pr 0 sw 0 rb 8 wb 0 vcx 516 icx > 2500 mx 0 ix 0 id 0 is > > cat xxx > A[21]=11{11}:22<LP3 > > Using what is to me the more obvious [0-9] pattern takes almost 50 times as > long as using the [[:digit:]] pattern. Seems very strange. ...
Thank you for the report. However, there have been numerous improvements since grep-2.25, which was released nearly a year ago. The latest is grep-3.0, and using it, I am unable to reproduce the problem on an input of 333M lines, each of length 19, and ending in "P": $ yes 'A[21]=11{11}:22<LP'| head -333000000 > /dev/shm/k $ env time grep 'P[0-9]' /dev/shm/k 7.84user 2.06system 0:09.90elapsed 100%CPU (0avgtext+0avgdata 2008maxresident)k 0inputs+0outputs (0major+97minor)pagefaults 0swaps [Exit 1] $ env time grep 'P[[:digit:]]' /dev/shm/k 7.86user 1.96system 0:09.83elapsed 99%CPU (0avgtext+0avgdata 2004maxresident)k 0inputs+0outputs (0major+97minor)pagefaults 0swaps [Exit 1]