On Fri, Nov 24, 2006 at 10:11:06AM +0000, Soeren Sonnenburg wrote: > Dear all, > > I am a bit puzzled, as > > -----snip----- > import bz2 > f=bz2.BZ2File('data/data.bz2'); > > while f.readline(): > pass > -----snip----- > > takes twice the time (10 seconds) to read/decode a bz2 file > compared to > > -----snip----- > import bz2 > f=bz2.BZ2File('data/data.bz2'); > x=f.readlines() > -----snip----- > > (5 seconds). This is even more strange as the help(bz2) says: > > | readlines(...) > | readlines([size]) -> list > | > | Call readline() repeatedly and return a list of lines read. > | The optional size argument, if given, is an approximate bound on > the > | total number of bytes in the lines returned. > > This happens on python2.3 - python2.5 and it does not help to specify a > maximum line size. > > Any ideas ?
The bz2 module is implemented in C so calling "f.readline()" repeatedly has extra Python => C call overhead that "f.readlines()" doesn't have because it stays in a tight C loop the whole time. -Jack -- http://mail.python.org/mailman/listinfo/python-list