Bugs item #1424148, was opened at 2006-02-04 12:35 Message generated for change (Comment added) made by jimjjewett You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1424148&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.4 Status: Open Resolution: None Priority: 6 Submitted By: Robert Kiendl (kxroberto) Assigned to: Nobody/Anonymous (nobody) Summary: urllib.FancyURLopener.redirect_internal looses data on POST! Initial Comment: def redirect_internal(self, url, fp, errcode, errmsg, headers, data): if 'location' in headers: newurl = headers['location'] elif 'uri' in headers: newurl = headers['uri'] else: return void = fp.read() fp.close() # In case the server sent a relative URL, join with original: newurl = basejoin(self.type + ":" + url, newurl) return self.open(newurl) ... has to become ... def redirect_internal(self, url, fp, errcode, errmsg, headers, data): if 'location' in headers: newurl = headers['location'] elif 'uri' in headers: newurl = headers['uri'] else: return void = fp.read() fp.close() # In case the server sent a relative URL, join with original: newurl = basejoin(self.type + ":" + url, newurl) return self.open(newurl,data) ... i guess? ( ",data" added ) Robert ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-02-06 15:52 Message: Logged In: YES user_id=764593 Sorry, I was trying to provide a quick explanation of why we couldn't just "do the obvious thing" and repost with data. Yes, I realize that in practice, GET is used for non- idempotent actions, and POST is (though less often) done automatically. But since that is the official policy, I wouldn't want to bet too heavily against it in a courtroom -- so python defaults should be at least as conservative as both the spec and the common practice. ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2006-02-06 15:24 Message: Logged In: YES user_id=261020 First, anyone replying to this, *please* read this page (and the whole of this tracker note!) first: http://ppewww.ph.gla.ac.uk/~flavell/www/post-redirect.html kxroberto: you say that with standard urllibX error handling you cannot get an exception on redirected 301/302/307 POST. That's not true of urllib2, since you may override HTTPRedirectHandler.redirect_request(), which method was designed and documented for precisely that purpose. It seems sensible to have a default that does what virtually all browsers do (speaking as a long-time lynx user!). I don't know about the urllib case. It's perfectly reasonable to extend urllib (if necessary) to allow the option of raising an exception. Note that (IIRC!) urllib's exceptions do not contain the response body data, however (urllib2's HTTPErrors do contain the response body data). It would of course break backwards compatibility to start raising exceptions by default here. I don't think it's reasonable to break old code on the basis of a notional security issue when the de-facto standard web client behaviour is to do the redirect. In reality, the the only "security" value of the original prescriptive rule was as a convention to be followed by white-hat web programmers and web client implementors to help users avoid unintentionally re-submitting non-idempotent requests. Since that convention is NOT followed in the real world (lynx doesn't count as the real world ;-), I see no value in sticking rigidly to the original RFC spec -- especially when 2616 even provides 307 precisely in response to this problem. Other web client libraries, for example libwww-perl and Java HTTPClient, do the same as Python here IIRC. RFC 2616 section 10.3.4 even suggests web programmers use 302 to get the behaviour you complain about! The only doubtful case here is 301. A decision was made on the default behaviour in that case back when the tracker item I pointed you to was resolved. I think it's a mistake to change our minds again on that default behaviour. kxroberto.seek(nrBytes) assert kxroberto.readline() == """\ To redirect POST as GET _while_ simply loosing (!) the data (and not appending it to the GET-URL) is most bad for a lib.""" No. There is no value in supporting behaviour which is simply contrary to both de-facto and prescriptive standards (see final paragraph of RFC 2616 section 10.3.3: if we accept the "GET on POST redirect" rule, we must accept that the Location header is exactly the URL that should be followed). FYI, many servers return a redirect URL containing the urlencoded POST data from the original request. kxroberto: """Don't know if the MS & netscape's also transpose to GET with long data? ...""" urllib2's behaviour (and urllib's, I believe) on these issues is identical to that of IE and Firefox. jimjewett: """In theory, a GET may be automatic, but a POST requires user interaction, so the user can be held accountable for the results of a POST, but not of a GET.""" That theory has been experimentally falsified ;-) ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-02-06 12:57 Message: Logged In: YES user_id=764593 In theory, a GET may be automatic, but a POST requires user interaction, so the user can be held accountable for the results of a POST, but not of a GET. Often, the page will respond to either; not sending the queries protects privacy in case of problems, and works more often than not. (That said, I too would prefer a raised error or a transparent repost, at least as options.) ---------------------------------------------------------------------- Comment By: Robert Kiendl (kxroberto) Date: 2006-02-06 05:29 Message: Logged In: YES user_id=972995 > http://python.org/sf/549151 the analyzation of the browsers is right. lynx is best ok to ask. But urllibX is not a browser (application) but a lib: As of now with standard urllibX error handling you cannot code a lynx. gvr's initial suggestion to raise a clear error (with redirection-link as attribute of the exception value) is best ok. Another option would be to simly yield the undirected stub HTML and leave the 30X-code (and redirection LOCATION in header). To redirect POST as GET _while_ simply loosing (!) the data (and not appending it to the GET-URL) is most bad for a lib. Transcribing smart a short formlike POST to a GET w QUERY would be so la la. Don't know if the MS & netscape's also transpose to GET with long data? ... The current behaviour is most worst of all 4. All other methods whould at least have raisen an early hint/error in my case. ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2006-02-05 19:54 Message: Logged In: YES user_id=261020 This is not a bug. See the long discussion here: http://python.org/sf/549151 ---------------------------------------------------------------------- Comment By: Robert Kiendl (kxroberto) Date: 2006-02-04 15:10 Message: Logged In: YES user_id=972995 Found http://www.faqs.org/rfcs/rfc2616.html (below). But the behaviour is still strange, and the bug even more serious: a silent redirection of a POST as GET without data is obscure for a Python language. Leads to unpredictable results. The cut half execution is not stopable and all is left to a good reaction of the server, and complex reinterpreation of the client. Python urllibX should by default yield the 30X code for a POST redirection and provide the first HTML: usually a redirection HTML stub with < a href=... That would be consistent with the RFC: the User (=Application! not Python!) can redirect under full control without generating a wrong call! In my application, a bug was long unseen because of this wrong behaviour. with 30X-stub it would have been easy to discover and understand ... urllib2 has the same bug with POST redirection. ======= 10.3.2 301 Moved Permanently The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible. This response is cacheable unless indicated otherwise. The new permanent URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s). If the 301 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued. Note: When automatically redirecting a POST request after receiving a 301 status code, some existing HTTP/1.0 user agents will erroneously change it into a GET request. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1424148&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com