i have a program that essentially loops through a textfile file thats
about 800 MB in size containing tab separated data... my program
parses this file and stores its fields in a dictionary of lists.
for line in file:
split_values = line.strip().split('\t')
# do stuff with split_values
currently, this is very slow in python, even if all i do is break up
each line using split() and store its values in a dictionary, indexing
by one of the tab separated values in the file.
I'm not sure what the situation is, but I regularly skim through
tab-delimited files of similar size and haven't noticed any
problems like you describe. You might try tweaking the optional
(and infrequently specified) bufsize parameter of the
open()/file() call:
bufsize = 4 * 1024 * 1024 # buffer 4 megs at a time
f = file('in.txt', 'r', bufsize)
for line in f:
split_values = line.strip().split('\t')
# do stuff with split_values
If not specified, you're at the mercy of the system-default
(perhaps OS specific?). You can read more at[1] along with the
associated warning about setvbuf()
-tkc
[1]
http://docs.python.org/library/functions.html#open
--
http://mail.python.org/mailman/listinfo/python-list