Re: Memory issues when storing as List of Strings vs List of List

2010-11-30 Thread Ben Finney
OW Ghim Siong writes: > I have a big file 1.5GB in size, with about 6 million lines of > tab-delimited data. I have to perform some filtration on the data and > keep the good data. After filtration, I have about 5.5 million data > left remaining. As you might already guessed, I have to read them

Re: Memory issues when storing as List of Strings vs List of List

2010-11-30 Thread Antoine Pitrou
On Tue, 30 Nov 2010 18:29:35 +0800 OW Ghim Siong wrote: > > Does anyone know why is there such a big difference memory usage when > storing the matrix as a list of list, and when storing it as a list of > string? That's because any object has a fixed overhead (related to metadata and allocatio

Re: Memory issues when storing as List of Strings vs List of List

2010-11-30 Thread Tim Chase
On 11/30/2010 04:29 AM, OW Ghim Siong wrote: a=open("bigfile") matrix=[] while True: lines = a.readlines(1) for line in lines: data=line.split("\t") if several_conditions_are_satisfied: matrix.append(data) print "Number of lines read:", len(li

Re: Memory issues when storing as List of Strings vs List of List

2010-11-30 Thread Peter Otten
OW Ghim Siong wrote: > Hi all, > > I have a big file 1.5GB in size, with about 6 million lines of > tab-delimited data. I have to perform some filtration on the data and > keep the good data. After filtration, I have about 5.5 million data left > remaining. As you might already guessed, I have to

Re: Memory issues when storing as List of Strings vs List of List

2010-11-30 Thread Ulrich Eckhardt
OW Ghim Siong wrote: > I have a big file 1.5GB in size, with about 6 million lines of > tab-delimited data. How many fields are there an each line? > I have to perform some filtration on the data and > keep the good data. After filtration, I have about 5.5 million data left > remaining. As you m

Memory issues when storing as List of Strings vs List of List

2010-11-30 Thread OW Ghim Siong
Hi all, I have a big file 1.5GB in size, with about 6 million lines of tab-delimited data. I have to perform some filtration on the data and keep the good data. After filtration, I have about 5.5 million data left remaining. As you might already guessed, I have to read them in batches and I d