On Oct 10, 10:57 pm, George Sakkis <[EMAIL PROTECTED]> wrote: > On Oct 10, 6:12 pm, [EMAIL PROTECTED] wrote: > > > > > On Oct 10, 1:02 pm, George Sakkis <[EMAIL PROTECTED]> wrote: > > > > On Oct 10, 2:32 pm, [EMAIL PROTECTED] wrote: > > > > > I have several ways to the following problem. > > > > > This is what I have: > > > > > ... > > > > import ClientForm > > > > import BeautifulSoup from BeautifulSoup > > > > > request = urllib2.Request('http://form.com/) > > > > > self.first_object = urllib2.open(request) > > > > > soup = BeautifulSoup(self.first_object) > > > > > forms = ClienForm.ParseResponse(self.first_object) > > > > > Now, when I do this, forms returns an index errror because no forms > > > > are returned, but the BeautifulSoup registers fine. > > > > First off, please copy and paste working code; the above has several > > > syntax errors, so it can't raise IndexError (or anything else for that > > > matter). > > > > > Now, when I switch the order to this: > > > > > import ClientForm > > > > import BeautifulSoup from BeautifulSoup > > > > > request = urllib2.Request('http://form.com/) > > > > > self.first_object = urllib2.open(request) > > > > > forms = ClienForm.ParseResponse(self.first_object) > > > > > soup = BeautifulSoup(self.first_object) > > > > > Now, the form is returned correctly, but the BeautifulSoup objects > > > > returns empty. > > > > > So what I can draw from this is both methods erase the properties of > > > > the object, > > > > No, that's not the case. What happens is that the http response object > > > returned by urllib2.open() is read by the ClienForm.ParseResponse or > > > BeautifulSoup - whatever happens first - and the second call has > > > nothing to read. > > > > The easiest solution is to save the request object and call > > > urllib2.open twice. Alternatively check if ClientForm has a parse > > > method that accepts strings instead of urllib2 requests and then read > > > and save the html text explicitly: > > > > >>> text = urllib2.open(request).read() > > > >>> soup = BeautifulSoup(text) > > > >>> forms = ClientForm.ParseString(text) > > > > HTH, > > > George > > > request = urllib2.Request(settings.register_page) > > > self.url_obj = urllib2.urlopen(request).read() > > > soup = BeautifulSoup(self.url_obj); > > > forms = ClientForm.ParseResponse(self.url_obj, > > backwards_compat=False) > > > Now I am getting this error: > > > Traceback (most recent call last): > > File "C:\Python25\Lib\site-packages\PyQt4\POS Pounder\Oct7\oct.py", > > line 1251, in createAccounts > > forms = ClientForm.ParseResponse(self.url_obj, > > backwards_compat=False) > > File "C:\Python25\lib\site-packages\clientform-0.2.9-py2.5.egg > > \ClientForm.py", line 1054, in ParseResponse > > AttributeError: 'str' object has no attribute 'geturl' > > Did you read what I wrote ? ClientForm.ParseResponse() expects a > response object, not a string. Browsing through its docs, it seems > there is an alternative parsing fuction, ClienForm.ParseFile(file, > base_uri, ...). > > The following should work (untested): > > from cStringIO import StringIO > > request = urllib2.Request(settings.register_page) > response = urllib2.urlopen(request) > text = response.read() > soup = BeautifulSoup(text) > forms = ClientForm.ParseFile(StringIO(text), response.geturl(), > backwards_compat=False) > > HTH, > George
Hello George, I seem to be running into the same problem as Vince. Your solution seems very good, but ClientForm gets a little bit more from the handle than just the text. > The following should work (untested): > > from cStringIO import StringIO > > request = urllib2.Request(settings.register_page) > response = urllib2.urlopen(request) > text = response.read() > soup = BeautifulSoup(text) > forms = ClientForm.ParseFile(StringIO(text), response.geturl(), > backwards_compat=False) Hello George, When running your code in my program, which is doing something very similar to Vince, I get: AttributeError: 'cStringIO.StringI' object has no attribute 'geturl' This makes perfect sense in regards to the way ClientForms handles requests. It seems that short of figuring out how to deepcopy the handle, your going to be stuck making the request twice. But this is going to hit the URL (server) twice, which I would say is a bad idea. I've been struggling with this issue for some time now, and this is the first place I've found a solid discussion about it. -Josh -- http://mail.python.org/mailman/listinfo/python-list