On 11/30/2010 04:29 AM, OW Ghim Siong wrote:
a=open("bigfile")
matrix=[]
while True:
     lines = a.readlines(100000000)
     for line in lines:
         data=line.split("\t")
         if several_conditions_are_satisfied:
             matrix.append(data)
     print "Number of lines read:", len(lines), "matrix.__sizeof__:",
matrix.__sizeof__()
     if len(lines)==0:
         break

As others have mentiond, don't use .readlines() but use the file-object as an iterator instead. This can even be rewritten as a simple list-comprehension:

  from csv import reader
  matrix = [data
    for data
    in reader(file('bigfile.txt', 'rb'), delimiter='\t')
    if several_conditions_are_satisfied(data)
    ]

Assuming that you're throwing away most of the data (the final "matrix" fits well within memory, even if the source file doesn't).

-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to