MRAB <pyt...@mrabarnett.plus.com> wrote: > On 05/03/2011 01:56, Bob Fnord wrote: > > I'm using python to do some log file analysis and I need to store > > on disk a very large dict with tuples of strings as keys and > > lists of strings and numbers as values. > > > > I started by using cPickle to save the instance of the class that > > contained this dict, but the pickling process started to write > > the file but ate so much memory that my computer (4 GB RAM) > > crashed so badly that I had to press the reset button. I've never > > seen out-of-memory errors do this before. Is this normal? > > > > (I know from the output that got written before the crash that my > > program had finished building the dict and started the > > pickle. When I tried running the other program that reads the > > pickle and analyzes the data in it, it gave an error because the > > file was incomplete. So I know where in my code the crash > > happened.) > > > >> From searching the web, I get the impression that pickle uses a > > lot of memory because it checked for recursion and other things > > that could break other serialization methods. So I've switched to > > using marshal to save the dict itself (the only persistent thing > > in the class, which just has convenience methods for adding data > > to the dict and searching it for the second stage of analysis). > > > > I found some references to h5 tables for getting around the > > pickling memory problem, but I got the impression they only work > > with fixed columns, not a somewhat complex data structure like > > mine. > > > > Any comments, suggestions? > > > Would a database work?
I want a portable data file (can be moved around the filesystem or copied to another machine and used), so I don't want to use mysql or postgres. I guess the "sqlite" approach would work, but I think it would be difficult to turn the tuples of strings and lists of strings and numbers into database table lines. Would a database in a file have any advantages over a file made by marshal or shelve? I'm more worried about the fact that a python program in user space can bring down the computer! -- http://mail.python.org/mailman/listinfo/python-list