Hi all, Fairly new to python but have programmed in other languages (C, Java) before. I was experimenting with a python program that needed to take a directory tree and get the total disk usage of every file (and subfolder) underneath it. This solution also has to run on Windows Server 2003 for work and it is accessing a NAS shared via CIFS. A sample folder I'm using contains about 26,000 subfolders and 435,000 files. The original solution I came up with was elegant, but extremely slow (compared to doing a right click in Windowsexplorer on the folder tree and clicking properties). It looked something like this: import os folder = r'Z:\foldertree' folder_size = 0 for (path, dirs, files) in os.walk(folder): for file in files: folder_size += os.path.getsize(os.path.join(path,file))
I profiled the above code and os.stat was taking up roughly 90% of the time. After digging around, I found some code in another post to use win32api to use API calls to speed this up (if you are interested, search for supper fast walk, yes super is misspelled). To my surprise, the average time is now about 1/7th of what it used to be. I believe the problem is that my simple solution had to call os.stat twice (once in the os.walk and once by me calling os.path.getsize) for every file and folder in the tree. I understand that os.stat can work on any OS. However, the expense should not be that dramatic of a difference (in my opinion). Is there an OS agnostic way to get this functionality to work faster? Also, if I wanted to port this to Linux or some other OS, is os.stat as expensive? If so, are there other libraries (like win32api) to assist in doing these operations faster? -- http://mail.python.org/mailman/listinfo/python-list