I make one new version more equally to original version:

--code--
#!/usr/bin/python

import os, sys, time
import glob, random, Queue
import threading

EXIT = False
BRANDS = {}
LOCK=threading.Lock()
EV=threading.Event()
POOL=Queue.Queue(0)
NRO_THREADS=20

def walkerr(err):
        print err

class Worker(threading.Thread):
        def run(self):
                EV.wait()
                while True:
                        try:
                                mydir=POOL.get(timeout=1)
                                if mydir == None:
                                        continue

                                for root, dirs, files in os.walk(mydir, 
onerror=walkerr):
                                        if EXIT:
                                                break

                                        terra_user = 'test'
                                        terra_brand = 'test'
                                        user_du = '0 a'
                                        user_total_files = 0

                                        LOCK.acquire()
                                        if not BRANDS.has_key(terra_brand):
                                                BRANDS[terra_brand] = {}
                                                BRANDS[terra_brand]['COUNT'] = 1
                                                BRANDS[terra_brand]['SIZE'] = 
int(user_du.split()[0])
                                                BRANDS[terra_brand]['FILES'] = 
user_total_files
                                        else:
                                                BRANDS[terra_brand]['COUNT'] = 
BRANDS[terra_brand]['COUNT'] + 1
                                                BRANDS[terra_brand]['SIZE'] = 
BRANDS[terra_brand]['SIZE'] + 
int(user_du.split()[0])
                                                BRANDS[terra_brand]['FILES'] = 
BRANDS[terra_brand]['FILES'] + 
user_total_files
                                        LOCK.release()

                        except Queue.Empty:
                                if EXIT:
                                        break
                                else:
                                        continue
                        except KeyboardInterrupt:
                                break
                        except Exception:
                                print mydir
                                raise

if len(sys.argv) < 2:
        print 'Usage: %s dir...' % sys.argv[0]
        sys.exit(1)

glob_dirs = []
for i in sys.argv[1:]:
        glob_dirs = glob_dirs + glob.glob(i+'/[a-z_]*')
random.shuffle(glob_dirs)

for x in xrange(NRO_THREADS):
        Worker().start()

try:
        for i in glob_dirs:
                POOL.put(i)

        EV.set()
        while not POOL.empty():
                time.sleep(1)
        EXIT = True

        while (threading.activeCount() > 1):
                time.sleep(1)
except KeyboardInterrupt:
        EXIT=True

for b in BRANDS:
        print '%s:%i:%i:%i' % (b, BRANDS[b]['SIZE'], BRANDS[b]['COUNT'], 
BRANDS[b]['FILES'])
--code--

And run in make servers:

# uname -r
2.6.18-8.1.15.el5
# python test.py /usr
test:0:2267:0
# python test.py /usr
test:0:2224:0
# python test.py /usr
test:0:2380:0
# python -V
Python 2.4.3

# uname -r
7.0-BETA2
# python test.py /usr
test:0:1706:0
# python test.py /usr
test:0:1492:0
# python test.py /usr
test:0:1524:0
# python -V
Python 2.5.1

# uname -r
2.6.9-42.0.8.ELsmp
# python test.py /usr
test:0:1311:0
# python test.py /usr
test:0:1486:0
# python test.py /usr
test:0:1520:0
# python -V
Python 2.3.4

I really don't know what's happen.

Another ideia?

Regards

Chris Mellon wrote:
> On Nov 13, 2007 1:06 PM, Marcus Alves Grando <[EMAIL PROTECTED]> wrote:
>> Diez B. Roggisch wrote:
>>> Marcus Alves Grando wrote:
>>>
>>>> Diez B. Roggisch wrote:
>>>>> Marcus Alves Grando wrote:
>>>>>
>>>>>> Hello list,
>>>>>>
>>>>>> I have a strange problem with os.walk and threads in python script. I
>>>>>> have one script that create some threads and consume Queue. For every
>>>>>> value in Queue this script run os.walk() and printing root dir. But if i
>>>>>> increase number of threads the result are inconsistent compared with one
>>>>>> thread.
>>>>>>
>>>>>> For example, run this code plus sort with one thread and after run again
>>>>>> with ten threads and see diff(1).
>>>>> I don't see any difference. I ran it with 1 and 10 workers + sorted the
>>>>> output. No diff whatsoever.
>>>> Do you test in one dir with many subdirs? like /usr or /usr/ports (in
>>>> freebsd) for example?
>>> Yes, over 1000 subdirs/files.
>> Strange, because to me accurs every time.
>>
>>>>> And I don't know what you mean by diff(1) - was that supposed to be some
>>>>> output?
>>>> No. One thread produce one result and ten threads produce another result
>>>> with less lines.
>>>>
>>>> Se example below:
>>>>
>>>> @@ -13774,8 +13782,6 @@
>>>>   /usr/compat/linux/proc/44
>>>>   /usr/compat/linux/proc/45
>>>>   /usr/compat/linux/proc/45318
>>>> -/usr/compat/linux/proc/45484
>>>> -/usr/compat/linux/proc/45532
>>>>   /usr/compat/linux/proc/45857
>>>>   /usr/compat/linux/proc/45903
>>>>   /usr/compat/linux/proc/46
>>> I'm not sure what that directory is, but to me that looks like the
>>> linux /proc dir, containing process ids. Which incidentially changes
>>> between the two runs, as more threads will have process id aliases.
>> My example are not good enough. I run this script in ports directory of
>> freebsd and imap folders in my linux server, same thing.
>>
>> @@ -182,7 +220,6 @@
>>   /usr/ports/archivers/p5-POE-Filter-Bzip2
>>   /usr/ports/archivers/p5-POE-Filter-LZF
>>   /usr/ports/archivers/p5-POE-Filter-LZO
>> -/usr/ports/archivers/p5-POE-Filter-LZW
>>   /usr/ports/archivers/p5-POE-Filter-Zlib
>>   /usr/ports/archivers/p5-PerlIO-gzip
>>   /usr/ports/archivers/p5-PerlIO-via-Bzip2
>> @@ -234,7 +271,6 @@
>>   /usr/ports/archivers/star-devel
>>   /usr/ports/archivers/star-devel/files
>>   /usr/ports/archivers/star/files
>> -/usr/ports/archivers/stuffit
>>   /usr/ports/archivers/szip
>>   /usr/ports/archivers/tardy
>>   /usr/ports/archivers/tardy/files
>>
>>
> 
> Are you just diffing the output? There's no guarantee that
> os.path.walk() will always have the same order, or that your different
> working threads will produce the same output in the same order. On my
> system, for example, I get a different order of subdirectory output
> when I run with 10 threads than with 1.
> 
> walk() requires that stat() works for the next directory that will be
> walked. It might be remotely possible that stat() is failing for some
> reason and some directories are being lost (this is probably not going
> to be reproducible). If you can reproduce it, trying using pdb to see
> what's going on inside walk().

-- 
Marcus Alves Grando
marcus(at)sbh.eng.br | Personal
mnag(at)FreeBSD.org  | FreeBSD.org
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to