Hi,
I'm learning Python, jumping in the deep end with a threading application. I came across an authoritative-looking site that recommends using queues for threading in Python.
http://www.ibm.com/developerworks/aix/library/au-threadingpython/index.html
The author provides example code that fetches data from several web sites, using threads. I have modified his code slightly, just adding a couple of print statements and passing an ID number to the thread.

#!/usr/bin/env python
import Queue
import threading
import urllib2
import time

hosts = ["http://yahoo.com";, "http://google.com";, "http://amazon.com";, "http://ibm.com";, "http://apple.com";]

queue = Queue.Queue()

class ThreadUrl(threading.Thread):
#"""Threaded Url Grab"""
  def __init__(self, queue,i):
    threading.Thread.__init__(self)
    self.queue = queue
    self.num = i
    print "Thread: ",self.num

  def run(self):
    while True:
      #grabs host from queue
      host = self.queue.get()
      print "num, host: ",self.num,host
      #grabs urls of hosts and prints first 1024 bytes of page
      url = urllib2.urlopen(host)
      print url.read(1024)

      #signals to queue job is done
      self.queue.task_done()

start = time.time()
def main():

  #spawn a pool of threads, and pass them queue instance
  for i in range(5):
    t = ThreadUrl(queue,i)
    t.setDaemon(True)
    t.start()

 #populate queue with data
    for host in hosts:
      queue.put(host)

 #wait on the queue until everything has been processed
    queue.join()

main()
print "Elapsed Time: %s" % (time.time() - start)

Executed on Windows with Python 2.5 this program doesn't do what you want, which is to fetch data from each site once. Instead, it processes the first host in the list 5 times, the next 4 times, etc, and the last just once. I don't know whether it is a case of the code simply being wrong (which seems unlikely), or the behaviour on my system being different from AIX (also seems unlikely).

Naively, I would have expected the queue to enforce processing of its members once only. Is there a simple change that will make this code execute as required? Or is this author out to lunch?

Cheers
Gib
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to