On Apr 30, 9:41 am, Steven D'Aprano <[EMAIL PROTECTED]> wrote: > On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote: > > Hi! > > I have a really long binary file that I want to read. > > The way I am doing it now is: > > > for i in xrange(N): # N is about 10,000,000 > > time = struct.unpack('=HHHH', infile.read(8)) > > # do something > > tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32)) > > I assume that is supposed to be infile.read() > > > # do something > > > Each loop takes about 0.2 ms in my computer, which means the whole for loop > > takes 2000 seconds. > > You're reading 400 million bytes, or 400MB, in about half an hour. Whether > that's fast or slow depends on what the "do something" lines are doing. > > > I would like it to run faster. > > Do you have any suggestions? > > Disk I/O is slow, so don't read from files in tiny little chunks. Read a > bunch of records into memory, then process them. > > # UNTESTED! > rsize = 8 + 32 # record size > for i in xrange(N//1000): > buffer = infile.read(rsize*1000) # read 1000 records at once > for j in xrange(1000): # process each record > offset = j*rsize > time = struct.unpack('=HHHH', buffer[offset:offset+8]) > # do something > tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize]) > # do something > > (Now I'm just waiting for somebody to tell me that file.read() already > buffers reads...) > > -- > Steven D'Aprano
I think the file.read() already buffers reads... :) -- http://mail.python.org/mailman/listinfo/python-list