On 03/11/2016 01:26 AM, Jim Meyering wrote: > On Thu, Mar 10, 2016 at 3:00 AM, JQK <jq...@redhat.com> wrote: >> If in the following situation, >> >> =========== >> file1 has numbers from 1 to 200000, 200000 lines >> file2 has several lines(about 200 ~300lines) of random numbers in the >> range of 1-200000 >> =========== >> >> The time cost for finishing the following command could be over 15 >> minutes on linux -- a little huge. >> >> $ grep -v -f file1 file2 >> >> (FYI, on AIX it could only be less than 1 second) >> >> Maybe there is also a room for optimization not only on the memory usage >> but also on the time cost. > > What version of grep are you using? > With the latest (grep-2.23), this takes > less than 1.5s on a core-i7-4770S-based system: > > $ env time grep -v -f <(seq 200000) <(shuf -i 1-200000 -n 250) > 1.27user 0.16system 0:01.43elapsed 100%CPU (0avgtext+0avgdata > 839448maxresident)k > 0inputs+0outputs (0major+233108minor)pagefaults 0swaps >
Sorry. In my situation, the grep command could be a little different, the command is: # grep -w -f file1 file2 Also after testing with the latest grep-2.23, it could slow. # env time grep -w -f <(seq 200000) <(shuf -i 1-200000 -n 250) -- Junkui Quan (JQK) www.redhat.com
signature.asc
Description: OpenPGP digital signature