New submission from Carl Meyer: Both urllib and urllib2 call urllib.unquote() multiple times on data in the userinfo section of an FTP URL. One call occurs at the end of the urllib.splituser() function. In urllib, the other call appears in URLOpener.open_ftp(). In urllib2, the other two occur in FTPHandler.ftp_open() and Request.get_host().
The effect of this is that if the userinfo section of an FTP url should need to contain a literal % sign followed by two digits, the % sign must be double-encoded as %2525 (for urllib) or triple-encoded as %252525 (for urllib2) in order for the URL to be accessed. The proper behavior would be to only ever unquote a given data segment once. The W3's URI: Generic Syntax RFC (http://gbiv.com/protocols/uri/rfc/rfc3986.html) addresses this very issue in section 2.4 (When to Encode or Decode): "Implementations must not percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent-encoding, or vice versa in the case of percent-encoding an already percent-encoded string." The solution would be to standardize where in urllib and urllib2 the unquoting happens, and then make sure it happens nowhere else. I'm not familiar enough with the libraries to know where it should be removed without possibly breaking other behavior. It seems that just removing the map/unquote call in urllib.splituser() would fix the problem in urllib. I would guess the call in urllib2 Request.get_host() should also be removed, as the RFC referenced above says clearly that only individual data segments of the URL should be decoded, not larger portions that might contain delimiters (: and @). I've attached a patchset for these suggested changes. Very superficial testing suggests that the patch doesn't break anything obvious, but I make no guarantees. ---------- components: Library (Lib) files: urllib-issue.patch keywords: patch messages: 63324 nosy: carljm severity: normal status: open title: urllib and urllib2 decode userinfo multiple times type: behavior versions: Python 2.5 Added file: http://bugs.python.org/file9621/urllib-issue.patch __________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2244> __________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com