I've been dumping a database in a python code format (for use with Python on S60 mobile phone actually) and I've noticed that it uses absolutely tons of memory as compared to how much the data structure actually needs once it is loaded in memory.
The programs below create a file (z.py) with a data structure in which looks like this -- z.py ---------------------------------------------------- z = { 0 : (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19), 1 : (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20), 2 : (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21), [snip] 998 : (998, 999, 1000, 1001, 1002, ..., 1012, 1013, 1014, 1015, 1016, 1017), 999 : (999, 1000, 1001, 1002, 1003, ..., 1013, 1014, 1015, 1016, 1017, 1018), } ------------------------------------------------------------ Under python2.2-python2.4 "import z" uses 8 MB, whereas loading a pickled dump of the file only takes 450kB. This has been improved in python2.5 so it only takes 2.2 MB. $ python2.5 memory_usage.py Memory used to import is 2284 kB Total size of repr(z) is 105215 Memory used to unpickle is 424 kB Total size of repr(z) is 105215 $ python2.4 memory_usage.py Memory used to import is 8360 kB Total size of repr(z) is 105215 Memory used to unpickle is 456 kB Total size of repr(z) is 105215 $ python2.3 memory_usage.py Memory used to import is 8436 kB Total size of repr(z) is 105215 Memory used to unpickle is 456 kB Total size of repr(z) is 105215 $ python2.2 memory_usage.py Memory used to import is 8568 kB Total size of repr(z) is 105215 Memory used to unpickle is 392 kB Total size of repr(z) is 105215 $ python2.1 memory_usage.py Memory used to import is 10756 kB Total size of repr(z) is 105215 Memory used to unpickle is 384 kB Total size of repr(z) is 105215 Why does it take so much memory? Is it some consequence of the way the datastructure is parsed? Note that once it has made the .pyc file the subsequent runs take even less memory than the cpickle import. S60 python is version 2.2.1. It doesn't have pickle unfortunately, but it does have marshal and the datastructures I need are marshal-able so that provides a good solution to my actual problem. Save the two programs below with the names given to demonstrate the problem. Note that these use some linux-isms to measure the memory used by the current process which will need to be adapted if you don't run it on linux! -- memory_usage.py ----------------------------------------- import os import sys import re from cPickle import dump def memory(): """Returns memory used (RSS) in kB""" status = open("/proc/self/status").read() match = re.search(r"(?m)^VmRSS:\s+(\d+)", status) memory = 0 if match: memory = int(match.group(1)) return memory def write_file(): """Write the file to be imported""" fd = open("z.py", "w") fd.write("z = {\n") for i in xrange(1000): fd.write(" %d : %r,\n" % (i, tuple(range(i,i+20)))) fd.write("}\n") fd.close() def main(): write_file() before = memory() from z import z after = memory() print "Memory used to import is %s kB" % (after-before) print "Total size of repr(z) is ",len(repr(z)) # Save a pickled copy for later dump(z, open("z.bin", "wb")) # Run the next part os.system("%s memory_usage1.py" % sys.executable) if __name__ == "__main__": main() -- memory_usage1.py ---------------------------------------- from memory_usage import memory from cPickle import load before = memory() z = load(open("z.bin", "rb")) after = memory() print "Memory used to unpickle is %s kB" % (after-before) print "Total size of repr(z) is ",len(repr(z)) ------------------------------------------------------------ -- Nick Craig-Wood <[EMAIL PROTECTED]> -- http://www.craig-wood.com/nick -- http://mail.python.org/mailman/listinfo/python-list