Re: Extract Information from Tables in html

Stefan Behnel Fri, 05 Sep 2008 09:11:25 -0700

Hi,

Jackie Wang wrote:
> Here is a html code:
> 
> <td valign="top" headers="col4">
> 
>          Premier Community Bank of Southwest Florida
>          <br />
>          Fort Myers, FL
> 
> </td>
> 
> My question is how I can extract the strings and get the results:
> Premier Community Bank of Southwest Florida; Fort Myers, FL


Use lxml.html. Something like this should do what you want:

  >>> from lxml import html
  >>> tree = html.parse("http://server.org/thefile.html";)
  >>> all_tds = tree.findall("//td")
  >>> for td in all_tds:
  ...     print( td.xpath("normalize-space()") )

Tweak as you see fit, tree iteration is at your service in case you need more.

http://codespeak.net/lxml/

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Re: Extract Information from Tables in html

Reply via email to