[EMAIL PROTECTED] wrote:
> I'm trying to get the data on the "Central London Property Price Guide"
> box at the left hand side of this page
> http://www.findaproperty.com/regi0018.html
> 
> I have managed to get the data :) but when I start looking for tables I
> only get tables of depth 1 how do I go about accessing inner tables?
> same happens for links...
> 
> this is what I've go so far
> 
> import sys
> from urllib import urlopen
> from BeautifulSoup import BeautifulSoup
> 
> data = urlopen('http://www.findaproperty.com/regi0018.html').read()
> soup = BeautifulSoup(data)
> 
> for tables in soup('table'):
>       table = tables('table')
>       if not table: continue
>       print table #this returns only 1 table

There's something fishy here. soup('table') should yield all the tables
in the document, even nested ones. For example, this program:

data = '''
<body>
      <table width='100%'>
          <tr><td>
              <TABLE WIDTH='150'>
                  <tr><td>Stuff</td></tr>
              </table>
          </td></tr>
      </table>
</body>
'''

from BeautifulSoup import BeautifulSoup as BS

soup = BS(data)
for table in soup('table'):
      print table.get('width')


prints:
100%
150

Another tidbit - if I open the page in Firefox and save it, then open 
that file into BeautifulSoup, it finds 25 tables and this code finds the 
table you want:

from BeautifulSoup import BeautifulSoup
data2 = open('regi0018-firefox.html')
soup = BeautifulSoup(data2)

print len(soup('table'))

priceGuide = soup('table', dict(bgcolor="#e0f0f8", border="0", 
cellpadding="2", cellspacing="2", width="150"))[1]
print priceGuide.tr


prints:
25
<tr><td bgcolor="#e0f0f8" valign="top"><font face="Arial" 
size="2"><b>Central London Property Price Guide</b></font></td></tr>


Looking at the saved file, Firefox has clearly done some cleanup. So I 
think you have to look at why BS is not processing the original data the 
way you want. It seems to be choking on something.

Kent
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to