On Sat, Feb 20, 2010 at 5:53 PM, Vincent Davis <vinc...@vincentdavis.net> wrote: >> On Sat, Feb 20, 2010 at 6:44 PM, Jonathan >> Gardner <jgard...@jonathangardner.net> wrote: >> >> With this kind of data set, you should start looking at BDBs or >> PostgreSQL to hold your data. While processing files this large is >> possible, it isn't easy. Your time is better spent letting the DB >> figure out how to arrange your data for you. > > I really do need all of it in at time, It is dna microarray data. Sure there > are 230,00 rows but only 4 columns of small numbers. Would it help to make > them float() ? I need to at some point. I know in numpy there is a way to set > the type for the whole array "astype()" I think. > What I don't get is that it show the size of the dict with all the data to > have only 6424 bytes. What is using up all the memory? >
Look into getting PostgreSQL to organize the data for you. It's much easier to do processing properly with a database handle than a file handle. You may also discover that writing functions in Python inside of PostgreSQL can scale very well for whatever data needs you have. -- Jonathan Gardner jgard...@jonathangardner.net -- http://mail.python.org/mailman/listinfo/python-list