Your program is doing what you asked it to do. It finds the
first table with class 'bp_ergebnis_tab_info'. Then it ignores
that results. Then it finds the first "td" item in the document,
and prints the contents of that. Then it exits. What did
you want it to do?
Try this. It prints out the TD items on each
row of the table, in order.
import urllib2
from BeautifulSoup import BeautifulSoup
page =
urllib2.urlopen("http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323")
soup = BeautifulSoup(page)
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
for row in table.findAll('tr') : # for all TR items (table rows)
for td in row.findAll('td') : # for TD items in row
text = td.renderContents().strip()
print(text)
print('-----') # mark end of row
John Nagle
On 12/25/2010 9:58 AM, Martin Kaspar wrote:
Hello dear Community,.
I am trying to get a scraper up and running: And keep running into
problems.
when I try what you have i have learned so far I only get:
<strong>Schuldaten</strong>
Here is the code that I used:
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://www.schulministerium.nrw.de/BP/
SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323")
soup = BeautifulSoup(page)
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
first_td = soup.find('td')
text = first_td.renderContents()
trimmed_text = text.strip()
print trimmed_text
i run it in the template at http://scraperwiki.com/scrapers/new/python
see the target:
http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323
What have I gotten wrong?
Can anybody review the code -
many thanks in Advance
regards
matze
--
http://mail.python.org/mailman/listinfo/python-list