I make one new version more equally to original version: --code-- #!/usr/bin/python
import os, sys, time import glob, random, Queue import threading EXIT = False BRANDS = {} LOCK=threading.Lock() EV=threading.Event() POOL=Queue.Queue(0) NRO_THREADS=20 def walkerr(err): print err class Worker(threading.Thread): def run(self): EV.wait() while True: try: mydir=POOL.get(timeout=1) if mydir == None: continue for root, dirs, files in os.walk(mydir, onerror=walkerr): if EXIT: break terra_user = 'test' terra_brand = 'test' user_du = '0 a' user_total_files = 0 LOCK.acquire() if not BRANDS.has_key(terra_brand): BRANDS[terra_brand] = {} BRANDS[terra_brand]['COUNT'] = 1 BRANDS[terra_brand]['SIZE'] = int(user_du.split()[0]) BRANDS[terra_brand]['FILES'] = user_total_files else: BRANDS[terra_brand]['COUNT'] = BRANDS[terra_brand]['COUNT'] + 1 BRANDS[terra_brand]['SIZE'] = BRANDS[terra_brand]['SIZE'] + int(user_du.split()[0]) BRANDS[terra_brand]['FILES'] = BRANDS[terra_brand]['FILES'] + user_total_files LOCK.release() except Queue.Empty: if EXIT: break else: continue except KeyboardInterrupt: break except Exception: print mydir raise if len(sys.argv) < 2: print 'Usage: %s dir...' % sys.argv[0] sys.exit(1) glob_dirs = [] for i in sys.argv[1:]: glob_dirs = glob_dirs + glob.glob(i+'/[a-z_]*') random.shuffle(glob_dirs) for x in xrange(NRO_THREADS): Worker().start() try: for i in glob_dirs: POOL.put(i) EV.set() while not POOL.empty(): time.sleep(1) EXIT = True while (threading.activeCount() > 1): time.sleep(1) except KeyboardInterrupt: EXIT=True for b in BRANDS: print '%s:%i:%i:%i' % (b, BRANDS[b]['SIZE'], BRANDS[b]['COUNT'], BRANDS[b]['FILES']) --code-- And run in make servers: # uname -r 2.6.18-8.1.15.el5 # python test.py /usr test:0:2267:0 # python test.py /usr test:0:2224:0 # python test.py /usr test:0:2380:0 # python -V Python 2.4.3 # uname -r 7.0-BETA2 # python test.py /usr test:0:1706:0 # python test.py /usr test:0:1492:0 # python test.py /usr test:0:1524:0 # python -V Python 2.5.1 # uname -r 2.6.9-42.0.8.ELsmp # python test.py /usr test:0:1311:0 # python test.py /usr test:0:1486:0 # python test.py /usr test:0:1520:0 # python -V Python 2.3.4 I really don't know what's happen. Another ideia? Regards Chris Mellon wrote: > On Nov 13, 2007 1:06 PM, Marcus Alves Grando <[EMAIL PROTECTED]> wrote: >> Diez B. Roggisch wrote: >>> Marcus Alves Grando wrote: >>> >>>> Diez B. Roggisch wrote: >>>>> Marcus Alves Grando wrote: >>>>> >>>>>> Hello list, >>>>>> >>>>>> I have a strange problem with os.walk and threads in python script. I >>>>>> have one script that create some threads and consume Queue. For every >>>>>> value in Queue this script run os.walk() and printing root dir. But if i >>>>>> increase number of threads the result are inconsistent compared with one >>>>>> thread. >>>>>> >>>>>> For example, run this code plus sort with one thread and after run again >>>>>> with ten threads and see diff(1). >>>>> I don't see any difference. I ran it with 1 and 10 workers + sorted the >>>>> output. No diff whatsoever. >>>> Do you test in one dir with many subdirs? like /usr or /usr/ports (in >>>> freebsd) for example? >>> Yes, over 1000 subdirs/files. >> Strange, because to me accurs every time. >> >>>>> And I don't know what you mean by diff(1) - was that supposed to be some >>>>> output? >>>> No. One thread produce one result and ten threads produce another result >>>> with less lines. >>>> >>>> Se example below: >>>> >>>> @@ -13774,8 +13782,6 @@ >>>> /usr/compat/linux/proc/44 >>>> /usr/compat/linux/proc/45 >>>> /usr/compat/linux/proc/45318 >>>> -/usr/compat/linux/proc/45484 >>>> -/usr/compat/linux/proc/45532 >>>> /usr/compat/linux/proc/45857 >>>> /usr/compat/linux/proc/45903 >>>> /usr/compat/linux/proc/46 >>> I'm not sure what that directory is, but to me that looks like the >>> linux /proc dir, containing process ids. Which incidentially changes >>> between the two runs, as more threads will have process id aliases. >> My example are not good enough. I run this script in ports directory of >> freebsd and imap folders in my linux server, same thing. >> >> @@ -182,7 +220,6 @@ >> /usr/ports/archivers/p5-POE-Filter-Bzip2 >> /usr/ports/archivers/p5-POE-Filter-LZF >> /usr/ports/archivers/p5-POE-Filter-LZO >> -/usr/ports/archivers/p5-POE-Filter-LZW >> /usr/ports/archivers/p5-POE-Filter-Zlib >> /usr/ports/archivers/p5-PerlIO-gzip >> /usr/ports/archivers/p5-PerlIO-via-Bzip2 >> @@ -234,7 +271,6 @@ >> /usr/ports/archivers/star-devel >> /usr/ports/archivers/star-devel/files >> /usr/ports/archivers/star/files >> -/usr/ports/archivers/stuffit >> /usr/ports/archivers/szip >> /usr/ports/archivers/tardy >> /usr/ports/archivers/tardy/files >> >> > > Are you just diffing the output? There's no guarantee that > os.path.walk() will always have the same order, or that your different > working threads will produce the same output in the same order. On my > system, for example, I get a different order of subdirectory output > when I run with 10 threads than with 1. > > walk() requires that stat() works for the next directory that will be > walked. It might be remotely possible that stat() is failing for some > reason and some directories are being lost (this is probably not going > to be reproducible). If you can reproduce it, trying using pdb to see > what's going on inside walk(). -- Marcus Alves Grando marcus(at)sbh.eng.br | Personal mnag(at)FreeBSD.org | FreeBSD.org -- http://mail.python.org/mailman/listinfo/python-list