Hi, Jackie Wang wrote: > Here is a html code: > > <td valign="top" headers="col4"> > > Premier Community Bank of Southwest Florida > <br /> > Fort Myers, FL > > </td> > > My question is how I can extract the strings and get the results: > Premier Community Bank of Southwest Florida; Fort Myers, FL
Use lxml.html. Something like this should do what you want: >>> from lxml import html >>> tree = html.parse("http://server.org/thefile.html") >>> all_tds = tree.findall("//td") >>> for td in all_tds: ... print( td.xpath("normalize-space()") ) Tweak as you see fit, tree iteration is at your service in case you need more. http://codespeak.net/lxml/ Stefan -- http://mail.python.org/mailman/listinfo/python-list