On Jan 28, 5:14 pm, John Machin <sjmac...@lexicon.net> wrote: > On Jan 29, 3:13 am, perfr...@gmail.com wrote: > > > > > hello all, > > > i have a large dictionary which contains about 10 keys, each key has a > > value which is a list containing about 1 to 5 million (small) > > dictionaries. for example, > > > mydict = {key1: [{'a': 1, 'b': 2, 'c': 'hello'}, {'d', 3, 'e': 4, 'f': > > 'world'}, ...], > > key2: [...]} > > > in total there are about 10 to 15 million lists if we concatenate > > together all the values of every key in 'mydict'. mydict is a > > structure that represents data in a very large file (about 800 > > megabytes). > > > what is the fastest way to pickle 'mydict' into a file? right now i am > > experiencing a lot of difficulties with cPickle when using it like > > this: > > > from cPickle import pickle > > pfile = open(my_file, 'w') > > pickle.dump(mydict, pfile) > > pfile.close() > > > this creates extremely large files (~ 300 MB) though it does so > > *extremely* slowly. it writes about 1 megabyte per 5 or 10 seconds and > > it gets slower and slower. it takes almost an hour if not more to > > write this pickle object to file. > > > is there any way to speed this up? i dont mind the large file... after > > all the text file with the data used to make the dictionary was larger > > (~ 800 MB) than the file it eventually creates, which is 300 MB. but > > i do care about speed... > > > i have tried optimizing this by using this: > > > s = pickle.dumps(mydict, 2) > > pfile.write(s) > > > but this takes just as long... any ideas ? is there a different module > > i could use that's more suitable for large dictionaries ? > > thank you very much. > > Pardon me if I'm asking the "bleedin' obvious", but have you checked > how much virtual memory this is taking up compared to how much real > memory you have? If the slowness is due to pagefile I/O, consider > doing "about 10" separate pickles (one for each key in your top-level > dictionary).
the slowness is due to CPU when i profile my program using the unix program 'top'... i think all the work is in the file I/O. the machine i am using several GB of ram and ram memory is not heavily taxed at all. do you know how file I/O can be sped up? in reply to the other poster: i thought 'shelve' simply calls pickle. if thats the case, it wouldnt be any faster, right ? -- http://mail.python.org/mailman/listinfo/python-list