Re: Extract Information from Tables in html

Peter Pearson Fri, 05 Sep 2008 08:08:17 -0700

On Fri, 5 Sep 2008 11:35:14 -0300, Walter Cruz <[EMAIL PROTECTED]> wrote:
> On Fri, Sep 5, 2008 at 11:29 AM, Jackie Wang <[EMAIL PROTECTED]> wrote:
>> Here is a html code:
>>
>> <td valign="top" headers="col4">
>>
>>         Premier Community Bank of Southwest Florida
>>         <br />
>>         Fort Myers, FL
>>
>> </td>
>>
>> My question is how I can extract the strings and get the results:
>> Premier Community Bank of Southwest Florida; Fort Myers, FL
>
> Use BeautifulSoup.


I agree, BeautifulSoup is wonderful.  Here are snippets of
code that I recently used to locate (in each of many HTML
files) the table that contained a particular heading:

  from BeautifulSoup import BeautifulSoup
  import re
  ...
  inlines = ifd.readlines()
  soup = BeautifulSoup( " ".join( inlines ) )
  x = soup.findAll( text = re.compile( "Technical Requirements - General" ) )
  x = x[0].parent
  while x.name != "table":
    x = x.parent
  tr_list = x.findAll( "tr", recursive = False )
  print "Table has %d rows." % len( tr_list )


-- 
To email me, substitute nowhere->spamcop, invalid->net.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Extract Information from Tables in html

Reply via email to