On Mar 20, 5:26 pm, Jim Garrison <j...@acm.org> wrote: > John Machin wrote: > > On Mar 21, 9:25 am, Jim Garrison <j...@acm.org> wrote: > >> I'm converting a Perl system to Python, and have run into a severe > >> performance problem with pickle. > > >> One facet of the system involves scanning and loading into memory a > >> couple of parallel directory trees containing OTO 10^4 files. The > >> trees don't change during development/testing and the scan takes 30-40 > >> seconds, so to save time I cache the loaded tree structure to disk, in > >> Perl with module Storable, and in Python with pickle. > > >> In Perl, the save operation produces a file of about 3MB, and both > >> save and restore take a second or two. In Python, pickle.dump() > >> produces a similar-size file but takes 20 seconds, and pickle.load() > >> takes 45 seconds, which is actually LONGER than the time required to > >> scan the directory trees. > > >> Is there anything I can do to speed up pickle.load() to get > >> performance comparable to Perl's Storable? > > > Have you read this: > > http://www.python.org/doc/2.6/library/pickle.html > > ? > > Have you considered using cPickle instead of pickle? > > Have you considered using *ickle.dump(..., protocol=-1) ? > > I'm using Python 3 on Windows (Server 2003). According to the docs > > "The pickle module has an transparent optimizer (_pickle) written > in C. It is used whenever available. Otherwise the pure Python > implementation is used." > > How can I tell if _pickle is being used?
The slow performance is most likely due to the poor performance of Python 3's IO, which is caused by (among other things) bad buffering strategy. It's a Python 3 growing pain, and is being rewritten. Python 3.1 should be must faster but it's not been released yet. As a workaround, mmap the file instead. For example (untested): f = open('dirlisting.dat','rb') try: f.seek(0,2) size = f.tell() f.seek(0,0) m = mmap.mmap(f.fileno(),size,access=mmap.ACCESS_READ) try: dir_listing = pickle.loads(m) finally: m.close() finally: f.close() Pickling the output left as an exercise. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list