Jim Jewett <[EMAIL PROTECTED]> added the comment: Matt,
Bill's main concern is with a policy decision; I doubt he would object to using your code once that is resolved. The purpose of the quoting functions is to turn a string (representing the human-readable version) into bytes (that go over the wire). If everything is ASCII, there isn't any disagreement -- but it also isn't obvious that they're bytes instead of characters. So people started (well, continued, since it dates to pre-unicode C) treating them as though they were strings. The fact that ASCII (and therefore most wire protocols) looks the same as bytes or as characters was one of the strongest arguments against splitting the bytes and string types. Now that this has been done, Bill feels we should be consistent. (You feel wire-protocol bytes should be treated as strings, if only as bytestrings, because the libraries use them that way -- but this is a policy decision.) To quote the final paragraph of 1.2.1 """ In local or regional contexts and with improving technology, users might benefit from being able to use a wider range of characters; such use is not defined by this specification. Percent-encoded octets (Section 2.1) may be used within a URI to represent characters outside the range of the US-ASCII coded character set if this representation is allowed by the scheme or by the protocol element in which the URI is referenced. Such a definition should specify the character encoding used to map those characters to octets prior to being percent-encoded for the URI. """ So the mapping to bytes (or "octets") for non-ASCII isn't defined (here), and if you want to use it, you need to specify charset. But in practice, people do use it without specifying a charset. Which charset should be assumed? The old code (and test cases) assumed Latin-1. You want to assume UTF-8 (though you took the document charset when available -- which might also make sense). _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3300> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com