Rob Wolfe wrote: > Steven Bethard <[EMAIL PROTECTED]> writes: >> I'd hate to steer a potential new Python developer to a clumsier > > "clumsier"??? > Try to parse this with your program: > > page2 = ''' > <html><head><title>URLs</title></head> > <body> > <ul> > <li><a href="http://domain1/page1">some page1</a></li> > <li><a href="http://domain2/page2">some page2</a></li> > </body></html> > '''
If you want to parse invalid HTML, I strongly encourage you to look into BeautifulSoup. Here's the updated code: import ElementSoup # http://effbot.org/zone/element-soup.htm import cStringIO tree = ElementSoup.parse(cStringIO.StringIO(page2)) for a_node in tree.getiterator('a'): url = a_node.get('href') if url is not None: print url >> I know that the wiki page is supposed to be Python 2.4 only, but I'd >> rather have no example than an outdated one. > > This example is by no means "outdated". Given the simplicity of the ElementSoup code above, I'd still contend that using HTMLParser here shows too complex an answer to too simple a problem. STeVe -- http://mail.python.org/mailman/listinfo/python-list