bug#26193: [0-9] versus [[:digit:]]

Jim Meyering Mon, 20 Mar 2017 12:53:21 -0700

tags 26193 moreinfo
done

On Mon, Mar 20, 2017 at 8:34 AM, John P. Linderman <jpl....@gmail.com> wrote:
> In what follows, file "conjectures" is a 6 billion bytes file in which each
> line contains at most one letter P, and few (see output) have a digit
> following a P. "rusage" is just a home-brew resource usage summary command.
>
>   rusage egrep 'P[0-9]' conjectures > xxx
> 695.55 real 688.33 user 2.40 sys 0 pf 186 pr 0 sw 0 rb 8 wb 1 vcx 19206 icx
> 2488 mx 0 ix 0 id 0 is
>
>   cat xxx
> A[21]=11{11}:22<LP3
>
>   rusage egrep 'P[[:digit:]]' conjectures > xxx
> 14.88 real 13.36 user 1.43 sys 0 pf 186 pr 0 sw 0 rb 8 wb 0 vcx 516 icx
> 2500 mx 0 ix 0 id 0 is
>
>   cat xxx
> A[21]=11{11}:22<LP3
>
> Using what is to me the more obvious [0-9] pattern takes almost 50 times as
> long as using the [[:digit:]] pattern. Seems very strange.
...


Thank you for the report. However, there have been numerous
improvements since grep-2.25, which was released nearly a year ago.
The latest is grep-3.0, and using it, I am unable to reproduce the
problem on an input of 333M lines, each of length 19, and ending in
"P":

  $ yes 'A[21]=11{11}:22<LP'| head -333000000 > /dev/shm/k
  $ env time grep 'P[0-9]' /dev/shm/k
  7.84user 2.06system 0:09.90elapsed 100%CPU (0avgtext+0avgdata
2008maxresident)k
  0inputs+0outputs (0major+97minor)pagefaults 0swaps
  [Exit 1]
  $ env time grep 'P[[:digit:]]' /dev/shm/k
  7.86user 1.96system 0:09.83elapsed 99%CPU (0avgtext+0avgdata 2004maxresident)k
  0inputs+0outputs (0major+97minor)pagefaults 0swaps
  [Exit 1]

bug#26193: [0-9] versus [[:digit:]]

Reply via email to