On Oct 10, 6:12 pm, [EMAIL PROTECTED] wrote: > On Oct 10, 1:02 pm, George Sakkis <[EMAIL PROTECTED]> wrote: > > > > > On Oct 10, 2:32 pm, [EMAIL PROTECTED] wrote: > > > > I have several ways to the following problem. > > > > This is what I have: > > > > ... > > > import ClientForm > > > import BeautifulSoup from BeautifulSoup > > > > request = urllib2.Request('http://form.com/) > > > > self.first_object = urllib2.open(request) > > > > soup = BeautifulSoup(self.first_object) > > > > forms = ClienForm.ParseResponse(self.first_object) > > > > Now, when I do this, forms returns an index errror because no forms > > > are returned, but the BeautifulSoup registers fine. > > > First off, please copy and paste working code; the above has several > > syntax errors, so it can't raise IndexError (or anything else for that > > matter). > > > > Now, when I switch the order to this: > > > > import ClientForm > > > import BeautifulSoup from BeautifulSoup > > > > request = urllib2.Request('http://form.com/) > > > > self.first_object = urllib2.open(request) > > > > forms = ClienForm.ParseResponse(self.first_object) > > > > soup = BeautifulSoup(self.first_object) > > > > Now, the form is returned correctly, but the BeautifulSoup objects > > > returns empty. > > > > So what I can draw from this is both methods erase the properties of > > > the object, > > > No, that's not the case. What happens is that the http response object > > returned by urllib2.open() is read by the ClienForm.ParseResponse or > > BeautifulSoup - whatever happens first - and the second call has > > nothing to read. > > > The easiest solution is to save the request object and call > > urllib2.open twice. Alternatively check if ClientForm has a parse > > method that accepts strings instead of urllib2 requests and then read > > and save the html text explicitly: > > > >>> text = urllib2.open(request).read() > > >>> soup = BeautifulSoup(text) > > >>> forms = ClientForm.ParseString(text) > > > HTH, > > George > > request = urllib2.Request(settings.register_page) > > self.url_obj = urllib2.urlopen(request).read() > > soup = BeautifulSoup(self.url_obj); > > forms = ClientForm.ParseResponse(self.url_obj, > backwards_compat=False) > > > > Now I am getting this error: > > Traceback (most recent call last): > File "C:\Python25\Lib\site-packages\PyQt4\POS Pounder\Oct7\oct.py", > line 1251, in createAccounts > forms = ClientForm.ParseResponse(self.url_obj, > backwards_compat=False) > File "C:\Python25\lib\site-packages\clientform-0.2.9-py2.5.egg > \ClientForm.py", line 1054, in ParseResponse > AttributeError: 'str' object has no attribute 'geturl'
Did you read what I wrote ? ClientForm.ParseResponse() expects a response object, not a string. Browsing through its docs, it seems there is an alternative parsing fuction, ClienForm.ParseFile(file, base_uri, ...). The following should work (untested): from cStringIO import StringIO request = urllib2.Request(settings.register_page) response = urllib2.urlopen(request) text = response.read() soup = BeautifulSoup(text) forms = ClientForm.ParseFile(StringIO(text), response.geturl(), backwards_compat=False) HTH, George -- http://mail.python.org/mailman/listinfo/python-list