Re: parsing tables with beautiful soup?

Duncan Booth Thu, 22 Mar 2007 08:01:07 -0800

"cjl" <[EMAIL PROTECTED]> wrote:

> This works:
> 
> for row in soup.find("table",{"class": "class_name"}):
>      for cell in row:
>           print cell.contents[0]
> 
> Is there a better way to do this?
>


It may work for the page you are testing against, but it wouldn't work if 
your page contained valid HTML. You are assuming that the TR elements are 
direct children of the TABLE, but HTML requires that the TR elements appear 
inside THEAD, TBODY or TFOOT elements, so if anyone ever corrects the html 
your code will break.

Something like this (untested) ought to work and be reasonably robust:

table = soup.find("table",{"class": "class_name"})
for row in table.findAll("tr"):
        for cell in row.findAll("td"):
                print cell.findAll(text=True)

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing tables with beautiful soup?

Reply via email to