Matt Giuca <[EMAIL PROTECTED]> added the comment: Following Guido and Antoine's reviews, I've written a new patch which fixes *most* of the issues raised. The ones I didn't fix I have noted below, and commented on the review site (http://codereview.appspot.com/2827/). Note: I intend to address all of these issues after some discussion.
Outstanding issues raised by the reviews: Doc/library/urllib.parse.rst: Should unquote accept a bytes/bytearray as well as a str? Lib/email/utils.py: Should encode_rfc2231 with charset=None accept strings with non-ASCII characters, and just encode them to UTF-8? Lib/test/test_http_cookiejar.py: Does RFC 2965 let me get away with changing the test case to expect UTF-8? (I'm pretty sure it doesn't care what encoding is used). Lib/test/test_urllib.py: Should quote raise a TypeError if given a bytes with encoding/errors arguments? (Motivation: TypeError is what you usually raise if you supply too many args to a function). Lib/urllib/parse.py: (As discussed above) Should quote accept safe characters outside the ASCII range (thereby potentially producing invalid URIs)? ------ Commit log for patch8: Fix for issue 3300. urllib.parse.unquote: Added "encoding" and "errors" optional arguments, allowing the caller to determine the decoding of percent-encoded octets. As per RFC 3986, default is "utf-8" (previously implicitly decoded as ISO-8859-1). Also fixed a bug in which mixed-case hex digits (such as "%aF") weren't being decoded at all. urllib.parse.quote: Added "encoding" and "errors" optional arguments, allowing the caller to determine the encoding of non-ASCII characters before being percent-encoded. Default is "utf-8" (previously characters in range(128, 256) were encoded as ISO-8859-1, and characters above that as UTF-8). Also characters/bytes above 128 are no longer allowed to be "safe". Also now allows either bytes or strings. Added functions urllib.parse.quote_from_bytes, urllib.parse.unquote_to_bytes. All quote/unquote functions now exported from the module. Doc/library/urllib.parse.rst: Updated docs on quote and unquote to reflect new interface, added quote_from_bytes and unquote_to_bytes. Lib/test/test_urllib.py: Added many new test cases testing encoding and decoding Unicode strings with various encodings, as well as testing the new functions. Lib/test/test_http_cookiejar.py, Lib/test/test_cgi.py, Lib/test/test_wsgiref.py: Updated and added test cases to deal with UTF-8-encoded URIs. Lib/email/utils.py: Calls urllib.parse.quote and urllib.parse.unquote with encoding="latin-1", to preserve existing behaviour (which the whole email module is dependent upon). Added file: http://bugs.python.org/file11069/parse.py.patch8 _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3300> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com