On Oct 10, 2:32 pm, [EMAIL PROTECTED] wrote: > I have several ways to the following problem. > > This is what I have: > > ... > import ClientForm > import BeautifulSoup from BeautifulSoup > > request = urllib2.Request('http://form.com/) > > self.first_object = urllib2.open(request) > > soup = BeautifulSoup(self.first_object) > > forms = ClienForm.ParseResponse(self.first_object) > > Now, when I do this, forms returns an index errror because no forms > are returned, but the BeautifulSoup registers fine.
First off, please copy and paste working code; the above has several syntax errors, so it can't raise IndexError (or anything else for that matter). > Now, when I switch the order to this: > > import ClientForm > import BeautifulSoup from BeautifulSoup > > request = urllib2.Request('http://form.com/) > > self.first_object = urllib2.open(request) > > forms = ClienForm.ParseResponse(self.first_object) > > soup = BeautifulSoup(self.first_object) > > Now, the form is returned correctly, but the BeautifulSoup objects > returns empty. > > So what I can draw from this is both methods erase the properties of > the object, No, that's not the case. What happens is that the http response object returned by urllib2.open() is read by the ClienForm.ParseResponse or BeautifulSoup - whatever happens first - and the second call has nothing to read. The easiest solution is to save the request object and call urllib2.open twice. Alternatively check if ClientForm has a parse method that accepts strings instead of urllib2 requests and then read and save the html text explicitly: >>> text = urllib2.open(request).read() >>> soup = BeautifulSoup(text) >>> forms = ClientForm.ParseString(text) HTH, George -- http://mail.python.org/mailman/listinfo/python-list