New submission from Matt Giuca <matt.gi...@gmail.com>: urllib.unquote fails to decode a percent-escape with mixed case. To demonstrate:
>>> unquote("%fc") '\xfc' >>> unquote("%FC") '\xfc' >>> unquote("%Fc") '%Fc' >>> unquote("%fC") '%fC' Expected behaviour: >>> unquote("%Fc") '\xfc' >>> unquote("%fC") '\xfc' I actually fixed this bug in Python 3, at Guido's request as part of the huge fix to issue 3300. To quote Guido: > # Maps lowercase and uppercase variants (but not mixed case). > That sounds like a disaster. Why would %aa and %AA be correct but > not %aA and %Aa? (Even though the old code had the same problem.) (Indeed, the RFC 3986 allows mixed-case percent escapes.) I have attached a patch which fixes it simply by removing the dict mapping all lower and uppercase variants to characters, and simply calling int(item[:2], 16). It's slower, but correct. This is the same solution we used in Python 3. I've also backported a number of test cases from Python 3 which cover this issue, and also legitimate bad percent encoding. Note: I've also backported the remainder of the 'unquote' test cases from Python 3 but I found another bug, so I will report that separately, with a patch. ---------- components: Library (Lib) files: urllib-unquote-mixcase.patch keywords: patch messages: 101044 nosy: mgiuca severity: normal status: open title: urllib.unquote doesn't decode mixed-case percent escapes type: behavior versions: Python 2.6, Python 2.7 Added file: http://bugs.python.org/file16540/urllib-unquote-mixcase.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8135> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com