On 19 lip, 12:23, davidgp <davidvanijzendo...@gmail.com> wrote: > hello, i'm new on this group, and quiet new to python! > i'm trying to scrap some adress data from bundes-telefonbuch.de but i > run into a problem: > the link is like > this:http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20 > and it is basically the same for every search query. > thus i need to submit post data to the webserver, i try to do this > like this: > > opener = urllib2.build_opener() > opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible; > Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')] > urllib2.install_opener(opener) > > data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A || > G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche > starten','S': '1','translationtemplate': 'checkstrasse'}) > > url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20' > response = urllib2.urlopen(url, data) > > this returns a page saying i have to reenter my search terms.. > what's going wrong here? > > Thanks!!
Try mechanize : http://wwwsearch.sourceforge.net/mechanize/ import mechanize response = mechanize.urlopen("http://www.bundes-telefonbuch.de/") forms = mechanize.ParseResponse(response, backwards_compat=False) form = forms[0] form["F0"] = "query" #enter query html = mechanize.urlopen(form.click()).read() f = open("tmp.html","w") f.writelines(html) f.close() Or you can try to parse response but I think that their HTML is not valid -- http://mail.python.org/mailman/listinfo/python-list