Dear Pythonistas, For a project I'm working on, I need to store fairly large dictionaries (several million keys) in some form (obviously not in memory). The obvious course of action was to use a database of some sort.
The operation is pretty simple, a function is handed a generator that gives it keys and values, and it maps the keys to the values in a non- relational database (simples!). I wrote some code implementing this using anydbm (which used dbhash on my system), and it worked fine for about a million entries, but then crashed raising a DBPageNotFoundError. I did a little digging around and couldn't figure out what was causing this or how to fix it. I then quickly swapped anydbm for good ol' fashioned dbm which uses gdbm, and it ran even faster a little longer, but after a million entries or so it raised the ever-so-unhelpful "gdbm fatal: write error". I then threw caution to the winds and tried simply using cPickle's dump in the hope of obtaining some data persistence, but it crashed fairly early with a "IOError: [Errno 122] Disk quota exceeded". Now the question is: is it something wrong with these dbms? Can they not deal with very large sets of data? If not, is there a more optimal tool for my needs? Or is the problem unrelated and has something to do with my lab computer? Best, Edward -- http://mail.python.org/mailman/listinfo/python-list