[issue3300] urllib.quote and unquote - Unicode issues

Matt Giuca Sun, 06 Jul 2008 18:45:39 -0700

Matt Giuca <[EMAIL PROTECTED]> added the comment:

Point taken. But the RFC certainly doesn't say that ISO-8859-1 should be
used. Since we're outputting a Unicode string in Python 3, we need to
decode with some encoding, and UTF-8 seems the most sensible and
standardised.
(Even the existing test case in test_urllib.py:466 uses a UTF-8-encoded
URL, and I had to fix it so it decodes into a meaningful string).


Having said that, it's possible that you may wish to use another
encoding, and legal to do so. Therefore, I'd suggest we add an
"encoding" argument to both quote and unquote, which defaults to "utf-8".

Note that in the current implementation, unquote is not an inverse of
quote, because quote uses UTF-8 to encode characters with code points >=
256, while unquote decodes them as ISO-8859-1. I think it's important
these two functions are inverses of each other.

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3300>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue3300] urllib.quote and unquote - Unicode issues

Reply via email to