lazy <[EMAIL PROTECTED]> wrote: ... > key1=>{key11=>[1,2] , key12=>[6,7] , .... } ... > Im processesing HUGE(~100M inserts into the dictionary) data.
What will you need from the "saved" version later? If you need lookups by single keys or key pairs then a relational DB may be best; if you need to recover the whole huge dictionary into memory (assuming it will fit, e.g. you have a 64-bit machine with ample RAM) then it might be more efficient to "dump" things into a custom-designed format. If I was uncertain about future uses of the saved data, I'd go for the maximal flexibility afforded by a relational DB -- if you save to a relational DB you can later easily use the data in every which way, including ad-hoc ones (such flexibility is a key architectural feature of relational databases). E.g., a single table with 4 fields might be enough: key, subkey, leftint, rightint -- each field of the appropriate type (you do tell us that the two ints are smallish integers, but we don't know directly about the types of the keys -- besides a hint that the main keys are presumably strings of some kind since that's what bsddb likes as its own keys, but still, choosing the right kind of string may be important in order to save space and obtain good performance). The (key,subkey) pair would be the primary key on that table, and unless and until you need peculiar "navigation" in the future you may not require other indices, constraints, or whatever. At the scale you're talking about (multi-gigabyte, it appears) it's probably worth using a powerful DB (such as PostgreSQL, interfacing it to Python via psycopg2) rather than a lightweight one (such as sqlite, which comes with Python 2.5); however, depending on how critical performance is, such issues are often best resolved by benchmarking (similarly, you might benchmark the performance for inserting into the properly constrained table from the start, vs originally making the table constraints-less and using an ALTER TABLE later to add the primary key constraint/index that will be useful for later lookups; "bulk insertions" into DBs can often benefit from the latter idea). Alex -- http://mail.python.org/mailman/listinfo/python-list