Matt Giuca <[EMAIL PROTECTED]> added the comment: Guido suggested that quote's "safe" parameter should allow any character, not just ASCII range. I've implemented this now. It was a lot messier than I imagined.
The problem is that in my older patches, both 's' and 'safe' are encoded to bytes right away, and the rest of the process is just octet encoding (matching each byte against the safe set to see whether or not to quote it). The new implementation requires that you delay encoding both of these till the iteration over the string, so you match each *character* against the safe set, then encode it if it's not in 'safe'. Now the problem is some encodings/errors produce bytes which are in the safe range. For instance quote('\u6f22', encoding='latin-1', errors='xmlcharrefreplace') should give "%26%2328450%3B" (which is "漢" encoded). To preserve this behaviour, you then have to check each *byte* of the encoded character against a 'safe bytes' set. I believe that will slow down the implementation considerably. In summary, it requires two levels of encoding: first characters, then bytes. You can see how messy it made my quote implementation - I've attached the patch (parse.py.patch8+allsafe). I don't think it's worth the extra code bloat and performance hit just to implement a feature whose only use is producing invalid URIs (since URIs are supposed to only have ASCII characters). Does anyone disagree, and want this feature in? Added file: http://bugs.python.org/file11092/parse.py.patch8+allsafe _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3300> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com