> pics = re.compile(r"images/.*\.jpeg") While I'm not sure if this is the issue, you might be having some trouble with the greediness of the "*" repeater here. HTML like
<img src="images/1.jpeg"><img src="hello.jpeg"> will yield a result of "images/1.jpeg"><img src="hello.jpeg" rather than the expected "images/1.jpeg" You can make it "stingy" (rather than greedy) by appending a question-mark: r"images/.*?\.jpeg" I also don't know if they all are coming back as "jpeg", or if some come back as "jpg", in which case you might want to use r"images/.*?\.jpe?g" This still might bork up on things like <img src="images/a.gif"><img src="2.jpeg"> My first thought would be to install the BeautifulSoup parser, and then use it to snag all the <img> tags in your document. Then you know you're just getting the tag, and in turn, just getting their associated "src" attribute. I do something like that in my comic-snatcher (scrapes comics from various sites so I can read them all in one place in one sitting). You're welcome to remash this code excerpt (there's no guarantee it's great code): req = urllib2.Request(url) req.add_header("Referer", referer) page = urllib2.urlopen(req) bs = BeautifulSoup.BeautifulSoup() map(bs.feed, page.readlines()) bs.done() r = re.compile(targetRegex) imageURLs = [img["src"] for img in bs.fetch("img")] targetImageURL = [url for url in imageURLs if r.match(url)] It does blithely assume every image has a "src" attribute as it should, but if not, you can put in an "if" clause in the assignment of imageURLs to only take those that have src attributes. As others have mentioned as well, once you successfully get back the list of images, you'll likely want to *extend()* your master list of image URLs with your list of currently-found-URLs, rather than *append()*, or otherwise you'll end up with a list of lists which may not be what you want. Just a few ideas you might want to try. -tkc -- http://mail.python.org/mailman/listinfo/python-list