"George" <[EMAIL PROTECTED]> wrote: > I'm very new to python and I have tried to read the tutorials but I am > unable to understand exactly how I must do this problem. > > Specifically, the showIPnums function takes a URL as input, calls the > read_page(url) function to obtain the entire page for that URL, and > then lists, in sorted order, the IP addresses implied in the "<A > HREF=· · ·>" tags within that page. > > > """ > Module to print IP addresses of tags in web file containing HTML > > >>> showIPnums('http://22c118.cs.uiowa.edu/uploads/easy.html') > ['0.0.0.0', '128.255.44.134', '128.255.45.54'] > > >>> showIPnums('http://22c118.cs.uiowa.edu/uploads/pytorg.html') > ['0.0.0.0', '128.255.135.49', '128.255.244.57', '128.255.30.11', > '128.255.34.132', '128.255.44.51', '128.255.45.53', > '128.255.45.54', '129.255.241.42', '64.202.167.129'] > > """ > > def read_page(url): > import formatter > import htmllib > import urllib > > htmlp = htmllib.HTMLParser(formatter.NullFormatter()) > htmlp.feed(urllib.urlopen(url).read()) > htmlp.close() > > def showIPnums(URL): > page=read_page(URL) > > if __name__ == '__main__': > import doctest, sys > doctest.testmod(sys.modules[__name__])
You forgot to mention that you don't want duplicates in the result. Here's a function that passes the doctest: from urllib import urlopen from urlparse import urlsplit from socket import gethostbyname from BeautifulSoup import BeautifulSoup def showIPnums(url): """Return the unique IPs found in the anchors of the webpage at the given url. >>> showIPnums('http://22c118.cs.uiowa.edu/uploads/easy.html') ['0.0.0.0', '128.255.44.134', '128.255.45.54'] >>> showIPnums('http://22c118.cs.uiowa.edu/uploads/pytorg.html') ['0.0.0.0', '128.255.135.49', '128.255.244.57', '128.255.30.11', '128.255.34.132', '128.255.44.51', '128.255.45.53', '128.255.45.54', '129.255.241.42', '64.202.167.129'] """ hrefs = set() for link in BeautifulSoup(urlopen(url)).fetch('a'): try: hrefs.add(gethostbyname(urlsplit(link["href"])[1])) except: pass return sorted(hrefs) HTH, George
-- http://mail.python.org/mailman/listinfo/python-list