"fynali" <[EMAIL PROTECTED]> writes: > Hi all, > > I have two files:
Others have pointed out the Python solution - use a set instead of a list for membership testing. I want to point out a better Unix solution ('cause I probably wouldn't have written a Python program to do this): > Objective: to remove the numbers present in barred-list from the > PSPfile. > > $ ls -lh PSP0000320.dat CBR0000319.dat > ... 56M Dec 28 19:41 PSP0000320.dat > ... 8.6M Dec 28 19:40 CBR0000319.dat > > $ wc -l PSP0000320.dat CBR0000319.dat > 4,462,603 PSP0000320.dat > 693,585 CBR0000319.dat > > I wrote the following in bash to do the same: > > #!/bin/bash > > ARGS=2 > > if [ $# -ne $ARGS ] # takes two arguments > then > echo; echo "Usage: `basename $0` {PSPfile} {CBRfile}" > echo; echo " eg.: `basename $0` PSP0000320.dat > CBR0000319.dat"; echo; > echo "NOTE: first argument: PSP file, second: CBR file"; > echo " this script _does_ no_ input validation!" > exit 1 > fi; > > # fix prefix; cost: 12.587 secs > cat $1 | sed -e 's/^0*/966/' > $1.good > cat $2 | sed -e 's/^0*/966/' > $2.good > > # sort/save files; for the 4,462,603 lines, cost: 36.589 secs > sort $1.good > $1.sorted > sort $2.good > $2.sorted > > # diff -y {PSP} {CBR}, grab the ones in PSPfile; cost: 31.817 secs > diff -y $1.sorted $2.sorted | grep "<" > $1.filtered > > # remove trailing junk [spaces & <]; cost: 1 min 3 secs > cat $1.filtered | sed -e 's/\([0-9]*\) *</\1/' > $1.cleaned > > # remove intermediate files, good, sorted, filtered > rm -f *.good *.sorted *.filtered > #:~ > > ...but strangely though, there's a discrepancy, the reason for which I > can't figure out! The above script can be shortened quite a bit: #!/bin/sh comm -23 <(sed 's/^0*/966/' $1 | sort) <(sed 's/^0*/966/ $2 | sort) Will output only lines that occur in $1. It also runs the seds and sorts in parallel, which can make a significant difference in the clock time it takes to get the job done. The Python version is probably faster, since it doesn't sort the data. <mike -- Mike Meyer <[EMAIL PROTECTED]> http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. -- http://mail.python.org/mailman/listinfo/python-list