On Fri, 5 Sep 2008 11:35:14 -0300, Walter Cruz <[EMAIL PROTECTED]> wrote: > On Fri, Sep 5, 2008 at 11:29 AM, Jackie Wang <[EMAIL PROTECTED]> wrote: >> Here is a html code: >> >> <td valign="top" headers="col4"> >> >> Premier Community Bank of Southwest Florida >> <br /> >> Fort Myers, FL >> >> </td> >> >> My question is how I can extract the strings and get the results: >> Premier Community Bank of Southwest Florida; Fort Myers, FL > > Use BeautifulSoup.
I agree, BeautifulSoup is wonderful. Here are snippets of code that I recently used to locate (in each of many HTML files) the table that contained a particular heading: from BeautifulSoup import BeautifulSoup import re ... inlines = ifd.readlines() soup = BeautifulSoup( " ".join( inlines ) ) x = soup.findAll( text = re.compile( "Technical Requirements - General" ) ) x = x[0].parent while x.name != "table": x = x.parent tr_list = x.findAll( "tr", recursive = False ) print "Table has %d rows." % len( tr_list ) -- To email me, substitute nowhere->spamcop, invalid->net. -- http://mail.python.org/mailman/listinfo/python-list