Re: Beautiful Soup Table Parsing

Andreas Perstinger Thu, 09 Aug 2012 00:29:26 -0700

On 09.08.2012 01:58, Tom Russell wrote:

For instance this code below:


soup = 
BeautifulSoup(urlopen('http://online.wsj.com/mdc/public/page/2_3021-tradingdiary2.html?mod=mdc_pastcalendar'))

table = soup.find("table",{"class": "mdcTable"})
for row in table.findAll("tr"):
     for cell in row.findAll("td"):
         print cell.findAll(text=True)

brings in a list that looks like this:


[snip]

What I want to do is only be getting the data for NYSE and nothing
else so I do not know if that's possible or not. Also I want to do
something like:

If cell.contents[0] == "Advances":
     Advances = next cell or whatever??---> this part I am not sure how to do.

Can someone help point me in the right direction to get the first data
point for the Advances row? I have others I will get as well but
figure once I understand how to do this I can do the rest.


To get the header row you could do something like:

header_row = table.find(lambda tag: tag.td.string == "NYSE")

From there you can look for the next row you are interested in:

advances_row = header_row.findNextSibling(lambda tag: tag.td.string =="Advances")


You could also iterate through all next siblings of the header_row:

for row in header_row.findNextSiblings("tr"):
     # do something

Bye, Andreas
--
http://mail.python.org/mailman/listinfo/python-list

Re: Beautiful Soup Table Parsing

Reply via email to