On Mar 18, 12:11 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
> Hi > I need to process a really huge text file (4GB) and this is what i > need to do. It takes for ever to complete this. I read some where that > "list comprehension" can fast up things. Can you point out how to do > it in this case? > thanks a lot! > > f = open('file.txt','r') > for line in f: > db[line.split(' ')[0]] = line.split(' ')[-1] > db.sync() You got several good suggestions; one that has not been mentioned but makes a big (or even the biggest) difference for large/huge file is the buffering parameter of open(). Set it to the largest value you can afford to keep the I/O as low as possible. I'm processing 15-25 GB files (you see "huge" is really relative ;-)) on 2-4GB RAM boxes and setting a big buffer (1GB or more) reduces the wall time by 30 to 50% compared to the default value. BerkeleyDB should have a buffering option too, make sure you use it and don't synchronize on every line. Best, George -- http://mail.python.org/mailman/listinfo/python-list