I am using shelve to store some data since it is probably the best solution to my "data formats, number of columns, etc can change at any time" problem. However, I seem to be dealing with bloat.
My original data is 33MB. When each row is converted to python lists, and inserted into a shelve DB, it balloons to 69MB. Now, there is some additional data in there namely a list of all the keys containing data (vs. the keys that contain version/file/config information), BUT if I copy all the data over to a dict and dump the dict to a file using cPickle, that file is only 49MB. I'm using pickle protocol 2 in both cases. Is this expected? Is there really that much overhead to using shelve and dbm files? Are there any similar solutions that are more space efficient? I'd use straight pickle.dump, but loading requires pulling the entire thing into memory, and I don't want to have to do that every time. [Note, for those that might suggest a standard DB. Yes, I'd like to use a regular DB, but I have a domain where the number of data points in a sample may change at any time, so a timestamp-keyed dict is arguably the best solution, thus my use of shelve.] Thanks for any pointers. j -- Joshua Kugler Lead System Admin -- Senior Programmer http://www.eeinternet.com PGP Key: http://pgp.mit.edu/ ID 0xDB26D7CE -- http://mail.python.org/mailman/listinfo/python-list