Hi, I have a dictionary something like this,
key1=>{key11=>[1,2] , key12=>[6,7] , .... } For lack of wording, I will call outer dictionary as dict1 and its value(inner dictionary) dict2 which is a dictionary of small fixed size lists(2 items) The key of the dictionary is a string and value is another dictionary (lets say dict2) dict2 has a string key and a list of 2 integers. Im processesing HUGE(~100M inserts into the dictionary) data. I tried 2 options both seem to be slower and Im seeking suggestions to improve the speed. The code is sort of in bits and pieces, so Im just giving the idea. 1) Use bsddb. when an insert is done, the db will have key1 as key and the value(i.e db[key1] will be be pickleled value of dict2). after 1000 inserts , I close and open the db ,inorder to flush the contents to disk. Also when I try to insert a key, if its already present, I unpickle the value and change something in dict2 and then pickle it back to the bsddb. 2)Instead of pickling the value(dict2) and storing in bsddb immediately, I keep the dict1(outer dictionary in memory) and when it reaches 1000 inserts, I store it to bsddb as before, pickling each individual value. The advantage of this is, when one insert is done, if its already present, I adjust the value and I dont need to unpickle and pickle it back, if its the memory. If its not present in memory, I will still need to lookup in bsddb This is not getting to speed even with option 2. Before inserting, I do some processing on the line, so the bottleneck is not clear to me, (i.e in processing or inserting to db). But I guess its mainly because of pickling and unpickling. Any suggestions will be appreciated :) -- http://mail.python.org/mailman/listinfo/python-list