On 18 Mar 2007 19:01:27 -0700, George Sakkis <[EMAIL PROTECTED]> wrote: > On Mar 18, 12:11 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > > I need to process a really huge text file (4GB) and this is what i > > need to do. It takes for ever to complete this. I read some where that > > "list comprehension" can fast up things. Can you point out how to do > > it in this case? > > thanks a lot! > > > > f = open('file.txt','r') > > for line in f: > > db[line.split(' ')[0]] = line.split(' ')[-1] > > db.sync() > You got several good suggestions; one that has not been mentioned but > makes a big (or even the biggest) difference for large/huge file is > the buffering parameter of open(). Set it to the largest value you can > afford to keep the I/O as low as possible. I'm processing 15-25 GB
Can you give example of how you process the 15-25GB files with the buffering parameter? It will be educational to everyone I think. > files (you see "huge" is really relative ;-)) on 2-4GB RAM boxes and > setting a big buffer (1GB or more) reduces the wall time by 30 to 50% > compared to the default value. BerkeleyDB should have a buffering > option too, make sure you use it and don't synchronize on every line. I changed the sync to once in every 100,000 lines. thanks a lot everyone! -- http://mail.python.org/mailman/listinfo/python-list