On Sat, Feb 20, 2010 at 6:44 PM, Jonathan Gardner < > jgard...@jonathangardner.net> wrote:
With this kind of data set, you should start looking at BDBs or PostgreSQL to hold your data. While processing files this large is possible, it isn't easy. Your time is better spent letting the DB figure out how to arrange your data for you. I really do need all of it in at time, It is dna microarray data. Sure there are 230,00 rows but only 4 columns of small numbers. Would it help to make them float() ? I need to at some point. I know in numpy there is a way to set the type for the whole array "astype()" I think. What I don't get is that it show the size of the dict with all the data to have only 6424 bytes. What is using up all the memory? *Vincent Davis 720-301-3003 * vinc...@vincentdavis.net my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis> On Sat, Feb 20, 2010 at 6:44 PM, Jonathan Gardner < jgard...@jonathangardner.net> wrote: > On Sat, Feb 20, 2010 at 5:07 PM, Vincent Davis <vinc...@vincentdavis.net> > wrote: > >> Code is below, The files are about 5mb and 230,000 rows. When I have 43 > >> files of them and when I get to the 35th (reading it in) my system gets > so > >> slow that it is nearly functionless. I am on a mac and activity monitor > >> shows that python is using 2.99GB of memory (of 4GB). (python 2.6 > 64bit). > >> The getsizeof() returns 6424 bytes for the alldata . So I am not sure > what > >> is happening. > > With this kind of data set, you should start looking at BDBs or > PostgreSQL to hold your data. While processing files this large is > possible, it isn't easy. Your time is better spent letting the DB > figure out how to arrange your data for you. > > -- > Jonathan Gardner > jgard...@jonathangardner.net >
-- http://mail.python.org/mailman/listinfo/python-list