I am parsing out a web page at http://online.wsj.com/mdc/public/page/2_3021-tradingdiary2.html?mod=mdc_pastcalendar using BeautifulSoup.
My problem is that I can parse into the table where the data I want resides but I cannot seem to figure out how to go about grabbing the contents of the cell next to my row header I want. For instance this code below: soup = BeautifulSoup(urlopen('http://online.wsj.com/mdc/public/page/2_3021-tradingdiary2.html?mod=mdc_pastcalendar')) table = soup.find("table",{"class": "mdcTable"}) for row in table.findAll("tr"): for cell in row.findAll("td"): print cell.findAll(text=True) brings in a list that looks like this: [u'NYSE'] [u'Latest close'] [u'Previous close'] [u'Week ago'] [u'Issues traded'] [u'3,114'] [u'3,136'] [u'3,134'] [u'Advances'] [u'1,529'] [u'1,959'] [u'1,142'] [u'Declines'] [u'1,473'] [u'1,070'] [u'1,881'] [u'Unchanged'] [u'112'] [u'107'] [u'111'] [u'New highs'] [u'141'] [u'202'] [u'222'] [u'New lows'] [u'15'] [u'11'] [u'42'] [u'Adv. volume*'] [u'375,422,072'] [u'502,402,887'] [u'345,372,893'] [u'Decl. volume*'] [u'245,106,870'] [u'216,507,612'] [u'661,578,907'] [u'Total volume*'] [u'637,047,653'] [u'728,170,765'] [u'1,027,754,710'] [u'Closing tick'] [u'+131'] [u'+102'] [u'-505'] [u'Closing Arms (TRIN)\x86'] [u'0.62'] [u'0.77'] [u'1.20'] [u'Block trades*'] [u'3,874'] [u'4,106'] [u'4,463'] [u'Adv. volume'] [u'1,920,440,454'] [u'2,541,919,125'] [u'1,425,279,645'] [u'Decl. volume'] [u'1,149,672,387'] [u'1,063,007,504'] [u'2,812,073,564'] [u'Total volume'] [u'3,186,154,537'] [u'3,643,871,536'] [u'4,322,541,539'] [u'Nasdaq'] [u'Latest close'] [u'Previous close'] [u'Week ago'] [u'Issues traded'] [u'2,607'] [u'2,604'] [u'2,554'] [u'Advances'] [u'1,085'] [u'1,596'] [u'633'] [u'Declines'] [u'1,390'] [u'880'] [u'1,814'] [u'Unchanged'] [u'132'] [u'128'] [u'107'] [u'New highs'] [u'67'] [u'87'] [u'41'] [u'New lows'] [u'36'] [u'36'] [u'83'] [u'Closing tick'] [u'+225'] [u'+252'] [u'+588'] [u'Closing Arms (TRIN)\x86'] [u'0.48'] [u'0.46'] [u'0.69'] [u'Block trades'] [u'10,790'] [u'8,961'] [u'5,890'] [u'Adv. volume'] [u'1,114,620,628'] [u'1,486,955,619'] [u'566,904,549'] [u'Decl. volume'] [u'692,473,754'] [u'377,852,362'] [u'1,122,931,683'] [u'Total volume'] [u'1,856,979,279'] [u'1,883,468,274'] [u'1,714,837,606'] [u'NYSE Amex'] [u'Latest close'] [u'Previous close'] [u'Week ago'] [u'Issues traded'] [u'434'] [u'432'] [u'439'] [u'Advances'] [u'185'] [u'204'] [u'202'] [u'Declines'] [u'228'] [u'202'] [u'210'] [u'Unchanged'] [u'21'] [u'26'] [u'27'] [u'New highs'] [u'10'] [u'12'] [u'29'] [u'New lows'] [u'4'] [u'7'] [u'13'] [u'Adv. volume*'] [u'2,365,755'] [u'5,581,737'] [u'11,992,771'] [u'Decl. volume*'] [u'4,935,335'] [u'4,619,515'] [u'15,944,286'] [u'Total volume*'] [u'7,430,052'] [u'10,835,106'] [u'28,152,571'] [u'Closing tick'] [u'+32'] [u'+24'] [u'+24'] [u'Closing Arms (TRIN)\x86'] [u'1.63'] [u'0.64'] [u'1.12'] [u'Block trades*'] [u'75'] [u'113'] [u'171'] [u'NYSE Arca'] [u'Latest close'] [u'Previous close'] [u'Week ago'] [u'Issues traded'] [u'1,188'] [u'1,205'] [u'1,176'] [u'Advances'] [u'580'] [u'825'] [u'423'] [u'Declines'] [u'562'] [u'361'] [u'730'] [u'Unchanged'] [u'46'] [u'19'] [u'23'] [u'New highs'] [u'17'] [u'45'] [u'42'] [u'New lows'] [u'5'] [u'25'] [u'12'] [u'Adv. volume*'] [u'72,982,336'] [u'140,815,734'] [u'73,868,550'] [u'Decl. volume*'] [u'58,099,822'] [u'31,998,976'] [u'185,213,281'] [u'Total volume*'] [u'146,162,965'] [u'175,440,329'] [u'260,075,071'] [u'Closing tick'] [u'+213'] [u'+165'] [u'+83'] [u'Closing Arms (TRIN)\x86'] [u'0.86'] [u'0.73'] [u'1.37'] [u'Block trades*'] [u'834'] [u'1,043'] [u'1,593'] What I want to do is only be getting the data for NYSE and nothing else so I do not know if that's possible or not. Also I want to do something like: If cell.contents[0] == "Advances": Advances = next cell or whatever??---> this part I am not sure how to do. Can someone help point me in the right direction to get the first data point for the Advances row? I have others I will get as well but figure once I understand how to do this I can do the rest. Thanks, Tom -- http://mail.python.org/mailman/listinfo/python-list