Matt Giuca <[EMAIL PROTECTED]> added the comment: I've been thinking more about the errors="strict" default. I think this was Guido's suggestion. I've decided I'd rather stick with errors="replace".
I changed errors="replace" to errors="strict" in patch 8, but now I'm worried that will cause problems, specifically for unquote. Once again, all the code in the stdlib which calls unquote doesn't provide an errors option, so the default will be the only choice when using these other services. I'm concerned that there'll be lots of unhandled exceptions flying around for URLs which aren't encoded with UTF-8, and a conscientious programmer will not be able to protect against user errors. Take the cgi module as an example. Typical usage is to write: > fields = cgi.FieldStorage() > foo = fields.getFirst("foo") If the QUERY_STRING is "foo=w%FCt" (Latin-1), with errors='strict', you get a UnicodeDecodeError when you call cgi.FieldStorage(). With errors='replace', the variable foo will be "w�t". I think in general I'd rather have '�'s in my program (representing invalid user input) than exceptions, since this is usually a user input error, not a programming error. (One problem is that all I can do to handle this is catch a UnicodeDecodeError on the call to FieldStorage; then I can't access any of the data). Now maybe something we can think about is propagating the "encoding" and "errors" argument through to a few other major functions (such as cgi.parse_qsl, cgi.FieldStorage and urllib.parse.urlencode), but that should be separately to this patch. _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3300> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com