En Sun, 25 May 2008 22:42:06 -0300, <[EMAIL PROTECTED]> escribió: > def joinSets(set1, set2): > for i in set2: > set1.add(i) > return set1
Use the | operator, or |= > Traceback (most recent call last): > File "C:/Python25/Progs/WebCrawler/spider2.py", line 47, in <module> > x = scrapeSites("http://www.yahoo.com") > File "C:/Python25/Progs/WebCrawler/spider2.py", line 31, in > scrapeSites > site = iterator.next() > RuntimeError: Set changed size during iteration You will need two sets: the one you're iterating over, and another collecting new urls. Once you finish iterating the first, continue with the new ones; stop when it's empty. > def scrapeSites(startAddress): > site = startAddress > sites = set() > iterator = iter(sites) > pos = 0 > while pos < 10:#len(sites): > newsites = scrapeSite(site) > joinSets(sites, newsites) > pos += 1 > site = iterator.next() > return sites Try this (untested): def scrapeSites(startAddress): allsites = set() # all links found so far pending = set([startAddress]) # pending sites to examine while pending: newsites = set() # new links for site in pending: newsites |= scrapeSite(site) pending = newsites - allsites allsites |= newsites return allsites > wtf? im not multithreading or anything so how can the size change here? You modified the set you were iterating over. Another example of the same problem: d = {'a': 1, 'b': 2, 'c':3} for key in d: d[key+key]=0 -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list