On Nov 24, 7:57 am, "Andre Meyer" <[EMAIL PROTECTED]> wrote: > > os.walk() is a nice generator for performing actions on all files in a > directory and subdirectories. However, how can one use os.walk() for walking > through two hierarchies at once? I want to synchronise two directories (just > backup for now), but cannot see how I can traverse a second one. I do this > now with os.listdir() recursively, which works fine, but I am afraid that > recursion can become inefficient for large hierarchies. >
I wrote a script to perform this function using the dircmp class in the filecmp module. I did something similar to this: import filecmp, os, shutil def backup(d1,d2): print 'backing up %s to %s' % (d1,d2) compare = filecmp.dircmp(d1,d2) for item in compare.left_only: fullpath = os.path.join(d1, item) if os.path.isdir(fullpath): shutil.copytree(fullpath,os.path.join(d2,item)) elif os.path.isfile(fullpath): shutil.copy2(fullpath,d2) for item in compare.diff_files: shutil.copy2(os.path.join(d1,item),d2) for item in compare.common_dirs: backup(os.path.join(d1,item),os.path.join(d2,item)) if __name__ == '__main__': import sys if len(sys.argv) == 3: backup(sys.argv[1], sys.argv[2]) My script has some error checking and keeps up to 5 previous versions of a changed file. I find it very efficient, even with recursion, as it only actually copies those files that have changed. I sync somewhere around 5 GB worth of files nightly across the network and I haven't had any trouble. Of course, if I just had rsync available, I would use that. Hope this helps, Pete -- http://mail.python.org/mailman/listinfo/python-list