[issue3300] urllib.quote and unquote - Unicode issues

Matt Giuca Wed, 13 Aug 2008 07:26:58 -0700

Matt Giuca <[EMAIL PROTECTED]> added the comment:

> I have no strong opinion on the very remaining points you listed,
> except that IMHO encode_rfc2231 with charset=None should not try to
> use UTF8 by default. But someone with more mail protocol skills
> should comment :)


OK I've come to the realization that DEMANDING ascii (and erroring on
non-ASCII chars) is better for the short term anyway, because we can
always decide later to relax the restrictions, but it's a lot worse to
add restrictions later. So I agree now, should be ASCII. And no, I don't
have mail protocol skills.

The same goes for unquote accepting bytes. We can decide to make it
accept bytes later, but can't remove that feature later, so it's best
(IMHO) to let it NOT accept bytes (which is the current behaviour).

> The bytes > 127 would be translated as themselves; this follows
> logically from how stuff is parsed -- %% and %FF are translated,
> everything else is not. But I don't really care, I doubt there's a
> need.

Ah but what about unquote (to string)? If it accepted bytes then it
would be a bytes->str operation, and then you need a policy on DEcoding
those bytes. It makes things too complex I think.

> I believe patch 9 still has errors defaulting to strict for quote().
> Weren't you going to change that?

I raised it as a concern, but I thought you overruled on that, so I left
it as errors='strict'. What do you want it to be? 'replace'? Now that
this issue has been fully discussed, I'm happy with whatever you decide.

> From looking at it briefly I
> worry that the implementation is pretty slow -- a method call for each
> character and a map() call sounds pretty bad.

Yes, it does sound pretty bad. However, that's the current way of doing
things in both 2.x and 3.x; I didn't change it (though it looks like I
changed a LOT, I really did try to change as little as possible!)
Assuming it wasn't made _slower_ than before, can we ignore existing
performance issues and treat them as a separate matter (and can be dealt
with after 3.0)?

I'm not putting up a new patch now. The only fix I'd make is to add
Antoine's "or 'ascii'" to email/utils.py, as suggested on the review
tracker. I'll make this change along with any other recommendations
after your review.

(That is Lib/email/utils.py line 222 becomes:
s = urllib.parse.quote(s, safe='', encoding=charset or 'ascii')
)

btw this Rietveld is amazing. I'm assuming I don't have permission to
upload patches there (can't find any button to do so) which is why I
keep posting them here and letting you upload to Rietveld ...

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3300>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue3300] urllib.quote and unquote - Unicode issues

Reply via email to