[EMAIL PROTECTED] wrote: > I'm trying to get the data on the "Central London Property Price Guide" > box at the left hand side of this page > http://www.findaproperty.com/regi0018.html > > I have managed to get the data :) but when I start looking for tables I > only get tables of depth 1 how do I go about accessing inner tables? > same happens for links... > > this is what I've go so far > > import sys > from urllib import urlopen > from BeautifulSoup import BeautifulSoup > > data = urlopen('http://www.findaproperty.com/regi0018.html').read() > soup = BeautifulSoup(data) > > for tables in soup('table'): > table = tables('table') > if not table: continue > print table #this returns only 1 table
There's something fishy here. soup('table') should yield all the tables in the document, even nested ones. For example, this program: data = ''' <body> <table width='100%'> <tr><td> <TABLE WIDTH='150'> <tr><td>Stuff</td></tr> </table> </td></tr> </table> </body> ''' from BeautifulSoup import BeautifulSoup as BS soup = BS(data) for table in soup('table'): print table.get('width') prints: 100% 150 Another tidbit - if I open the page in Firefox and save it, then open that file into BeautifulSoup, it finds 25 tables and this code finds the table you want: from BeautifulSoup import BeautifulSoup data2 = open('regi0018-firefox.html') soup = BeautifulSoup(data2) print len(soup('table')) priceGuide = soup('table', dict(bgcolor="#e0f0f8", border="0", cellpadding="2", cellspacing="2", width="150"))[1] print priceGuide.tr prints: 25 <tr><td bgcolor="#e0f0f8" valign="top"><font face="Arial" size="2"><b>Central London Property Price Guide</b></font></td></tr> Looking at the saved file, Firefox has clearly done some cleanup. So I think you have to look at why BS is not processing the original data the way you want. It seems to be choking on something. Kent -- http://mail.python.org/mailman/listinfo/python-list