Re: SimplePrograms challenge

Steven Bethard Tue, 12 Jun 2007 15:16:56 -0700

Rob Wolfe wrote:
> Steven Bethard <[EMAIL PROTECTED]> writes:
>> I'd hate to steer a potential new Python developer to a clumsier
> 
> "clumsier"???
> Try to parse this with your program:
> 
> page2 = '''
>      <html><head><title>URLs</title></head>
>      <body>
>      <ul>
>      <li><a href="http://domain1/page1";>some page1</a></li>
>      <li><a href="http://domain2/page2";>some page2</a></li>
>      </body></html>
>      '''


If you want to parse invalid HTML, I strongly encourage you to look into 
BeautifulSoup. Here's the updated code:

     import ElementSoup # http://effbot.org/zone/element-soup.htm
     import cStringIO

     tree = ElementSoup.parse(cStringIO.StringIO(page2))
     for a_node in tree.getiterator('a'):
         url = a_node.get('href')
         if url is not None:
             print url

>> I know that the wiki page is supposed to be Python 2.4 only, but I'd
>> rather have no example than an outdated one.
> 
> This example is by no means "outdated".

Given the simplicity of the ElementSoup code above, I'd still contend 
that using HTMLParser here shows too complex an answer to too simple a 
problem.

STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: SimplePrograms challenge

Reply via email to