per wrote: > i have a program that essentially loops through a textfile file thats > about 800 MB in size containing tab separated data... my program > parses this file and stores its fields in a dictionary of lists. > > for line in file: > split_values = line.strip().split('\t') > # do stuff with split_values > > currently, this is very slow in python, even if all i do is break up > each line using split() and store its values in a dictionary, indexing > by one of the tab separated values in the file. > > is this just an overhead of python that's inevitable? do you guys > think that switching to cython might speed this up, perhaps by > optimizing the main for loop? or is this not a viable option?
For the general approach and the overall speed of your program it does matter what you want to do with the data once you've read it -- can you tell us a bit about that? Peter -- http://mail.python.org/mailman/listinfo/python-list