python - HTML processing - need tips
I need to process a HTML form in python. I'm using urllib2 and HTMLParser to handle the html. There are several steps I need to take to get to the specific page on the relevant site the first of which is to log in with a username/password. The html code that processes the login consists of 2 edit boxes (for User ID and Password) and a Submit button which uses ASP.net client side validation as follows (formatted for clarity): User ID: Valid Email format is required Password: I've looked at all the relevant posts on this topic and already looked at mechanize and ClientForm. It appears I can't use those for 2 reasons: 1) that they can't handle client side validation and 2) this button doesn't actually reside in a form and I haven't been able to find any python code that obtains a handle to a submit control and simulates clicking on it. I've tried sending the server a POST message as such: loginParams = urllib.urlencode({'txtUserName': theUsername, 'txtUserPass': thePassword}) txdata = None txheaders = {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} req = Request(url1, txdata, txheaders)# url1 points to the secure page seen following login handle = urlopen(req, loginParams) But this doesn't work. I dont understand the use of Page_ClientValidate( ) and haven't really found any useful documentation on it for my purposes. I basically need to be able to submit this information to the site, by simulating the onclick event through python. As far as I understand I need a solution to the 2 points I mentioned above (getting past client-side validation and simulating a click of a non-form button). Any help on this (or other issues I might have missed but are important/relevant) would be great! Many thanks, Pythonner -- http://mail.python.org/mailman/listinfo/python-list
Re: python - HTML processing - need tips
I figured it out... Just turned the POST request into a GET to see what was getting appended to the URL - thanks Gabe! Gabriel Genellina wrote: > At Monday 7/8/2006 20:58, wipit wrote: > > >I need to process a HTML form in python. I'm using urllib2 and > >HTMLParser to handle the html. There are several steps I need to take > >to get to the specific page on the relevant site the first of which is > >to log in with a username/password. The html code that processes the > >login consists of 2 edit boxes (for User ID and Password) and a Submit > >button which uses ASP.net client side validation as follows (formatted > >for clarity): > > Another approach would be using HTTPDebugger > <http://www.softx.org/debugger.html> to see exactly what gets > submitted, and then build a compatible Request. > On many sites you don't even need to *get* the login page -nor parse > it-, just posting the right Request is enough to log in successfully. > > > > Gabriel Genellina > '@'.join(('gagsl-py','.'.join(('yahoo','com','ar' > > > > > > __ > Preguntá. Respondé. Descubrí. > Todo lo que querías saber, y lo que ni imaginabas, > está en Yahoo! Respuestas (Beta). > ¡Probalo ya! > http://www.yahoo.com.ar/respuestas -- http://mail.python.org/mailman/listinfo/python-list