Caleb Hattingh wrote: > Hi everyone > > [Short version: I put a some code below: what changes can make it run > faster?]
On my slow notebook, your code takes about 1.5 seconds to do my C:\Python24 dir. With a few changes my code does it in about 1 second. Here is my code: import os, os.path, math def foldersize(fdir): """Returns the size of all data in folder fdir in bytes""" root, dirs, files = os.walk(fdir).next() files = [os.path.join(root, x) for x in files] dirs = [os.path.join(root, x) for x in dirs] return sum(map(os.path.getsize, files)) + sum(map(foldersize, dirs)) suffixes = ['bytes','kb','mb','gb','tb'] def prettier(bytesize): """Convert a number in bytes to a string in MB, GB, etc""" # What power of 1024 is less than or equal to bytesize? exponent = int(math.log(bytesize, 1024)) if exponent > 4: return "%d bytes" % bytesize return "%8.2f %s" % (bytesize / 1024.0 ** exponent, suffixes[exponent]) rootfolders = [i for i in os.listdir('.') if os.path.isdir(i)] results = [ (foldersize(folder), folder) for folder in rootfolders ] for size, folder in sorted(results): print "%s\t%s" % (folder, prettier(size)) print print "Total:\t%s" % prettier(sum ( size for size, folder in results )) # End The biggest change I made was to use os.walk rather than os.path.walk. os.walk is newer, and a bit easier to understand; it takes just a single directory path as an argument, and returns a nice generator object that you can use in a for loop to walk the entire tree. I use it in a somewhat unconventional way here. Look at the docs for a more conventional application. The "map(os.path.getsize, files)" code should run a bit faster than a for loop, because map only has to look up the getsize function once. I use log in the "prettier" function rather than your chain of ifs. The chain of ifs might actually be faster. But I spent so long studying math in school that I like to use it whenever I get a chance. Some other comments on your code: > def cmpfunc(a,b): > if a.count > b.count: > return 1 > elif a.count == b.count: > return 0 > else: > return -1 This could be just "return a.count - b.count". Cmp does not require -1 or +1, just a positive, negative, or zero. > foldersizeobjects.sort(cmpfunc) You could also use the key parameter; it is usually faster than a cmp function. As you can see, I used a tuple; the sort functions by default sort on the first element of the tuples. Of course, sorting is not a serious bottleneck in either program. > tot=0 > for foldersize in foldersizeobjects: > tot=tot+foldersize.count > print foldersize "tot +=" is cooler than tot = tot + . And perhaps a bit faster. -- http://mail.python.org/mailman/listinfo/python-list