On Jan 29, 9:43 am, perfr...@gmail.com wrote: > On Jan 28, 5:14 pm, John Machin <sjmac...@lexicon.net> wrote: > > > > > On Jan 29, 3:13 am, perfr...@gmail.com wrote: > > > > hello all, > > > > i have a large dictionary which contains about 10 keys, each key has a > > > value which is a list containing about 1 to 5 million (small) > > > dictionaries. for example, > > > > mydict = {key1: [{'a': 1, 'b': 2, 'c': 'hello'}, {'d', 3, 'e': 4, 'f': > > > 'world'}, ...], > > > key2: [...]} > > > > in total there are about 10 to 15 million lists if we concatenate > > > together all the values of every key in 'mydict'. mydict is a > > > structure that represents data in a very large file (about 800 > > > megabytes). > > > > what is the fastest way to pickle 'mydict' into a file? right now i am > > > experiencing a lot of difficulties with cPickle when using it like > > > this: > > > > from cPickle import pickle > > > pfile = open(my_file, 'w') > > > pickle.dump(mydict, pfile) > > > pfile.close() > > > > this creates extremely large files (~ 300 MB) though it does so > > > *extremely* slowly. it writes about 1 megabyte per 5 or 10 seconds and > > > it gets slower and slower. it takes almost an hour if not more to > > > write this pickle object to file. > > > > is there any way to speed this up? i dont mind the large file... after > > > all the text file with the data used to make the dictionary was larger > > > (~ 800 MB) than the file it eventually creates, which is 300 MB. but > > > i do care about speed... > > > > i have tried optimizing this by using this: > > > > s = pickle.dumps(mydict, 2) > > > pfile.write(s) > > > > but this takes just as long... any ideas ? is there a different module > > > i could use that's more suitable for large dictionaries ? > > > thank you very much. > > > Pardon me if I'm asking the "bleedin' obvious", but have you checked > > how much virtual memory this is taking up compared to how much real > > memory you have? If the slowness is due to pagefile I/O, consider > > doing "about 10" separate pickles (one for each key in your top-level > > dictionary). > > the slowness is due to CPU when i profile my program using the unix > program 'top'... i think all the work is in the file I/O. the machine > i am using several GB of ram and ram memory is not heavily taxed at > all. do you know how file I/O can be sped up?
More quick silly questions: (1) How long does it take to load that 300MB pickle back into memory using: (a) cpickle.load(f) (b) f.read() ? What else is happening on the machine while you are creating the pickle? (2) How does -- http://mail.python.org/mailman/listinfo/python-list