[issue3300] urllib.quote and unquote - Unicode issues

Guido van Rossum Wed, 06 Aug 2008 14:39:55 -0700

Guido van Rossum <[EMAIL PROTECTED]> added the comment:

Dear GvR,


New code review comments by GvR have been published.
Please go to http://codereview.appspot.com/2827 to read them.

Message:
Hi Matt,

Here's a code review of your patch.

I'm leaning more and more towards wanting this for 3.0, but I have some API
design issues and also some style nits.

I'm cross-linking this with the Python tracker issue, through the subject.

Details:

http://codereview.appspot.com/2827/diff/1/2
File Doc/library/urllib.parse.rst (right):

http://codereview.appspot.com/2827/diff/1/2#newcode198
Line 198: replaced by a placeholder character.
I don't think that's a good default.  I'd rather see it default to strict --
that's what encoding translates to everywhere else.  I believe that lenient
error handling by default can cause subtle security problems too, by hiding
problem characters from validation code.

http://codereview.appspot.com/2827/diff/1/2#newcode215
Line 215: An alias for :func:`quote`, intended for use with a :class:`bytes`
object
I'd rather see this as a wrapper that raises TypeError if the argument  
isn't a
bytes or bytearray instance.  Otherwise it's needless redundancy.

http://codereview.appspot.com/2827/diff/1/2#newcode223
Line 223: Replace ``%xx`` escapes by their single-character equivalent.
Should add what the argument type is -- I vote for str or bytes/bytearray.

http://codereview.appspot.com/2827/diff/1/2#newcode242
Line 242: .. function:: unquote_to_bytes(string)
Again, add what the argument type is.

http://codereview.appspot.com/2827/diff/1/4
File Lib/email/utils.py (right):

http://codereview.appspot.com/2827/diff/1/4#newcode224
Line 224: except:
An unqualified except clause is unacceptable here.  Why do you need this  
anyway?

http://codereview.appspot.com/2827/diff/1/5
File Lib/test/test_http_cookiejar.py (right):

http://codereview.appspot.com/2827/diff/1/5#newcode1450
Line 1450: "%3c%3c%0Anew%C3%A5/%C3%A5",
I'm guessing this test broke otherwise?  Given that this references an RFC,  
is
it correct to just fix it this way?

http://codereview.appspot.com/2827/diff/1/3
File Lib/urllib/parse.py (right):

http://codereview.appspot.com/2827/diff/1/3#newcode10
Line 10: "urlsplit", "urlunsplit"]
Please add all the quote/unquote versions here too.
(They were there in 2.5, but somehow got dropped from 3.0.

http://codereview.appspot.com/2827/diff/1/3#newcode265
Line 265: # Maps lowercase and uppercase variants (but not mixed case).
That sounds like a disaster.  Why would %aa and %AA be correct but not %aA  
and
%Aa?  (Even though the old code had the same problem.)

http://codereview.appspot.com/2827/diff/1/3#newcode283
Line 283: def unquote(s, encoding = "utf-8", errors = "replace"):
Please no spaces around the '=' when used for an argument default (or for a
keyword arg).

Also see my comment about defaulting to 'replace' in the doc file.

Finally -- let's be consistent about quotes.  It seems most of this file  
uses
single quotes, so lets stick to that (except docstrings always use double
quotes).

And more: what should a None value for encoding or errors mean?  IMO it  
should
mean "use the default".

http://codereview.appspot.com/2827/diff/1/3#newcode382
Line 382: safe = safe.encode('ascii', 'ignore')
Using errors='ignore' seems like a mistake -- it will hide errors.

I also wonder why safe should be limited to ASCII though.

http://codereview.appspot.com/2827/diff/1/3#newcode399
Line 399: if ' ' in s:
This test means that it won't work if the input is bytes.  E.g.

urllib.parse.quote_plus(b"abc def")

raises a TypeError.

Sincerely,

   Your friendly code review daemon (http://codereview.appspot.com/).

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3300>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue3300] urllib.quote and unquote - Unicode issues

Reply via email to