On 11/30/2010 04:29 AM, OW Ghim Siong wrote:
a=open("bigfile")
matrix=[]
while True:
lines = a.readlines(100000000)
for line in lines:
data=line.split("\t")
if several_conditions_are_satisfied:
matrix.append(data)
print "Number of lines read:", len(lines), "matrix.__sizeof__:",
matrix.__sizeof__()
if len(lines)==0:
break
As others have mentiond, don't use .readlines() but use the
file-object as an iterator instead. This can even be rewritten
as a simple list-comprehension:
from csv import reader
matrix = [data
for data
in reader(file('bigfile.txt', 'rb'), delimiter='\t')
if several_conditions_are_satisfied(data)
]
Assuming that you're throwing away most of the data (the final
"matrix" fits well within memory, even if the source file doesn't).
-tkc
--
http://mail.python.org/mailman/listinfo/python-list