En Wed, 17 Mar 2010 19:04:14 -0300, Keir Vaughan-taylor <kei...@gmail.com> escribió:

I am traversing a large set of directories using

for root, dirs, files in os.walk(basedir):
    run program

Being a huge directory set the traversal is taking days to do a
traversal.
Sometimes it is the case there is a crash because of a programming
error.
As each directory is processed the name of the directory is written to
a file
I want to be able to restart the walk from the directory where it
crashed.

Is this possible?

If the 'dirs' list were guaranteed to be sorted, you could remove at each level all previous directories already traversed. But it's not :(

Perhaps a better approach would be, once, collect all directories to be processed and write them on a text file -- these are the pending directories. Then, read from the pending file and process every directory in it. If the process aborts for any reason, manually delete the lines already processed and restart.

If you use a database instead of a text file, and mark entries as "done" after processing, you can avoid that last manual step and the whole process may be kept running automatically. In some cases you may want to choose the starting point at random.

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to