I'm using python to do some log file analysis and I need to store on disk a very large dict with tuples of strings as keys and lists of strings and numbers as values.
I started by using cPickle to save the instance of the class that contained this dict, but the pickling process started to write the file but ate so much memory that my computer (4 GB RAM) crashed so badly that I had to press the reset button. I've never seen out-of-memory errors do this before. Is this normal? (I know from the output that got written before the crash that my program had finished building the dict and started the pickle. When I tried running the other program that reads the pickle and analyzes the data in it, it gave an error because the file was incomplete. So I know where in my code the crash happened.) >From searching the web, I get the impression that pickle uses a lot of memory because it checked for recursion and other things that could break other serialization methods. So I've switched to using marshal to save the dict itself (the only persistent thing in the class, which just has convenience methods for adding data to the dict and searching it for the second stage of analysis). I found some references to h5 tables for getting around the pickling memory problem, but I got the impression they only work with fixed columns, not a somewhat complex data structure like mine. Any comments, suggestions? -- http://mail.python.org/mailman/listinfo/python-list