On Mon, Sep 6, 2010 at 3:01 PM, Edward Grefenstette <egre...@gmail.com> wrote: > Dear Pythonistas, > > For a project I'm working on, I need to store fairly large > dictionaries (several million keys) in some form (obviously not in > memory). The obvious course of action was to use a database of some > sort. > > The operation is pretty simple, a function is handed a generator that > gives it keys and values, and it maps the keys to the values in a non- > relational database (simples!). > > I wrote some code implementing this using anydbm (which used dbhash on > my system), and it worked fine for about a million entries, but then > crashed raising a DBPageNotFoundError. I did a little digging around > and couldn't figure out what was causing this or how to fix it. > > I then quickly swapped anydbm for good ol' fashioned dbm which uses > gdbm, and it ran even faster a little longer, but after a million > entries or so it raised the ever-so-unhelpful "gdbm fatal: write > error". > > I then threw caution to the winds and tried simply using cPickle's > dump in the hope of obtaining some data persistence, but it crashed > fairly early with a "IOError: [Errno 122] Disk quota exceeded". > > Now the question is: is it something wrong with these dbms? Can they > not deal with very large sets of data? If not, is there a more optimal > tool for my needs? Or is the problem unrelated and has something to do > with my lab computer? > > Best, > Edward > --
Just as a guess, I'd say that you have a disk quota that you're hitting with your several million key dbm. You might want to talk to the lab administrator about raising the quota. -- http://mail.python.org/mailman/listinfo/python-list