On Jan 26, 10:51 am, Jeff McNeil <j...@jmcneil.net> wrote: > On Jan 26, 10:22 am, redbaron <ivanov.ma...@gmail.com> wrote: > > > I've one big (6.9 Gb) .gz file with text inside it. > > zcat bigfile.gz > /dev/null does the job in 4 minutes 50 seconds > > > python code have been doing the same job for 25 minutes and still > > doesn't finish =( the code is simpliest I could ever imagine: > > > def main(): > > fh = gzip.open(sys.argv[1]) > > all(fh) > > > As far as I understand most of the time it executes C code, so pythons > > no overhead should be noticible. Why is it so slow? > > Look what's happening in both operations. The zcat operation is simply > uncompressing your data and dumping directly to /dev/null. Nothing is > done with the data as it's uncompressed. > > On the other hand, when you call 'all(fh)', you're iterating through > every element in in bigfile.gz. In other words, you're reading the > file and scanning it for newlines versus simply running the > decompression operation.
The File: ---------------------------------------------------- [j...@marvin ~]$ ls -alh junk.gz -rw-rw-r-- 1 jeff jeff 113M 2009-01-26 10:42 junk.gz [j...@marvin ~]$ The 'zcat' time: ---------------------------------------------------- [j...@marvin ~]$ time zcat junk.gz > /dev/null real 0m2.390s user 0m2.296s sys 0m0.093s [j...@marvin ~]$ Test Script #1: ---------------------------------------------------- import sys import gzip fs = gzip.open('junk.gz') data = fs.read(8192) while data: sys.stdout.write(data) data = fs.read(8192) Test Script #1 Time: ---------------------------------------------------- [j...@marvin ~]$ time python test9.py >/dev/null real 0m3.681s user 0m3.201s sys 0m0.478s [j...@marvin ~]$ Test Script #2: ---------------------------------------------------- import sys import gzip fs = gzip.open('junk.gz') all(fs) Test Script #2 Time: ---------------------------------------------------- [j...@marvin ~]$ time python test10.py real 1m51.764s user 1m51.475s sys 0m0.245s [j...@marvin ~]$ -- http://mail.python.org/mailman/listinfo/python-list