On 03/12/2016 04:17 AM, Jim Meyering wrote: > [resending to keep the list on Cc] > On Thu, Mar 10, 2016 at 10:05 PM, JQK <jq...@redhat.com> wrote: >> On 03/11/2016 01:26 AM, Jim Meyering wrote: >>> On Thu, Mar 10, 2016 at 3:00 AM, JQK <jq...@redhat.com> wrote: >>>> If in the following situation, >>>> >>>> =========== >>>> file1 has numbers from 1 to 200000, 200000 lines >>>> file2 has several lines(about 200 ~300lines) of random numbers in the >>>> range of 1-200000 >>>> =========== >>>> >>>> The time cost for finishing the following command could be over 15 >>>> minutes on linux -- a little huge. >>>> >>>> $ grep -v -f file1 file2 >>>> >>>> (FYI, on AIX it could only be less than 1 second) >>>> >>>> Maybe there is also a room for optimization not only on the memory usage >>>> but also on the time cost. >>> >>> What version of grep are you using? >>> With the latest (grep-2.23), this takes >>> less than 1.5s on a core-i7-4770S-based system: >>> >>> $ env time grep -v -f <(seq 200000) <(shuf -i 1-200000 -n 250) >>> 1.27user 0.16system 0:01.43elapsed 100%CPU (0avgtext+0avgdata >>> 839448maxresident)k >>> 0inputs+0outputs (0major+233108minor)pagefaults 0swaps >> >> Sorry. >> In my situation, the grep command could be a little different, the >> command is: >> >> # grep -w -f file1 file2 > > The command I provided is stand-alone, and equivalent to > what you described, except that it generates the two > input files as part of the command. However, the cost of > generating those two inputs is minimal. The <(...) notation > is a feature called process substitution. It should work > both with bash and with zsh. > > Please show the precise commands (and output) that > you used to produce the inputs and to time the grep > invocation. > >> Also after testing with the latest grep-2.23, it could slow. > > I don't understand the above. Please rephrase. > If you used a system-provided version of grep, > tell us what "rpm -q grep" prints. >
The testing is as following: 【grep version】 # grep -V grep (GNU grep) 2.23 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>. 【without option "-F"】 # env time grep -w -f <(seq 200000) <(shuf -i 1-200000 -n 250) : 288.77user 64.23system 10:35.71elapsed 55%CPU (0avgtext+0avgdata 3492784maxresident)k 8967032inputs+0outputs (154389major+1493890minor)pagefaults 0swaps 【with option "-F"】 # env time grep -F -w -f <(seq 200000) <(shuf -i 1-200000 -n 250) : 0.10user 0.01system 0:00.22elapsed 53%CPU (0avgtext+0avgdata 87856maxresident)k 0inputs+0outputs (0major+5534minor)pagefaults 0swaps -- Junkui Quan (JQK) www.redhat.com
signature.asc
Description: OpenPGP digital signature