Tom, Thanks for the reply and sorry for the delay in getting back to you. Thanks for pointing out my logic problem. I had added the 2nd part of the if statement at the last minute...
Yes I have a single threaded version its several hundred lines and uses COM to write the results out to and Excel spreadsheet.. I was trying to better understand threading and queues before I started hacking on my current code... maybe that was a mistake... hey I'm still learning and I learn a lot just by reading stuff posted to this group. I hope at some point I can help others in the same way. Here are the relevent parts of the code (no COM stuff) here is a summary: # see if url exists # if exists then # hit page # get text of page # see if text of page contains search terms # if it does then # update appropiate counters and lists # else update static line and do the next one # when done with Links list # - calculate totals and times # - write info to xls file # end. # utils are functions and classes that I wrote # from utils import PrintStatic, HttpExists2 # # My version of 'easyExcel' with extentions and improvements. # import excelled import urllib2 import time import socket import os #import msvcrt # for printstatic from datetime import datetime import pythoncom from sys import exc_info, stdout, argv, exit # search terms to use for matching. #primarySearchTerm = 'Narrow your' ST_lookingFor = 'Looking for Something' ST_errorConnecting = 'there has been an error connecting' ST_zeroMatch = 'You found 0 products' ST_zeroMatch2 = 'There are no products matching your selection' #initialize Globals timeout = 90 # sets timeout for urllib2.urlopen() failedlinks = [] # list for failed urls zeromatch = [] # list for 0 result searches pseudo404 = [] # list for shop.com 404 pages t = 0 # used to store starting time for getting a page. count = 0 # number of tests so far pagetime = 0 # time it took to load page slowestpage = 0 # slowest page time fastestpage = 10 # fastest page time cumulative = 0 # total time to load all pages (used to calc. avg) #version number of the program version = 'B2.9' def ShopCom404(testUrl): """ checks url for shop.com 404 url shop.com 404 url -- returns status 200 http://www.shop.com/amos/cc/main/404/ccsyn/260 """ if '404' in testUrl: return True else: return False ##### main program ##### try: links = open(testfile).readlines() except: exc, err, tb = exc_info() print 'There is a problem with the file you specified. Check the file and re-run the program.\n' #print str(exc) print str(err) print exit() # timeout in seconds socket.setdefaulttimeout(timeout) totalNumberTests = len(links) print 'URLCheck ' + version + ' by Greg Moore (c) 2005 Shop.com\n\n' # asctime() returns a human readable time stamp whereas time() doesn't startTimeStr = time.asctime() start = datetime.today() for url in links: count = count + 1 #HttpExists2 - checks to see if URL exists and detects redirection. # handles 404's and exceptions better. Returns tuple depending on results: # if found: true and final url. if not found: false and attempted url pgChk = HttpExists2(url) if pgChk[0] == False: #failed url Exists failedlinks.append(pgChk[1]) elif ShopCom404(pgChk[1]): #Our version of a 404 pseudo404.append(url) if pgChk[0] and not ShopCom404(url): #if valid page not a 404 then get the page and check it. try: t = time.time() urlObj = urllib2.urlopen(url) pagetime = time.time() - t webpg = urlObj.read() if (ST_zeroMatch in self.webpg) or (ST_zeroMatch2 in self.webpg): zeromatch.append(url) elif ST_errorConnecting in webpg: # for some reason we got the error page # so add it to the failed urls failmsg = 'Error Connecting Page with: ' + url failedlinks.append(failmsg) except: print 'exception with: ' + url #figure page times cumulative += pagetime if pagetime > slowestpage: slowestpage = pagetime, url.strip() elif pagetime < fastestpage: fastestpage = pagetime, url.strip() msg = 'testing ' + str(count) + ' of ' + str(totalNumberTests) + \ '. Currnet runtime: ' + str(datetime.today() - start) # status message that updates the same line. #PrintStatic(msg) ### Now write out results end = datetime.today() finished = datetime.today() finishedTimeStr = time.asctime() avg = cumulative/totalNumberTests failed = len(failedlinks) nomatches = len(zeromatch) #setup COM connection to Excel and write the spreadsheet. If I understand what I've read about threading I need to convert much of the above into a function and then call threading.thread start or run to fire off each thread. but where and how and how to limit to X number of threads is the part I get lost on. The example I've seen using queues and threads never show using a list (squence) for the source data and I'm not sure where I'd use the Queue stuff or for that mattter if I'm just complicating the issue. Once again thanks for the help. Greg. -- http://mail.python.org/mailman/listinfo/python-list