Jim wrote: > My understanding is that I am supposed to be able to urlencode anything > up to the top half of latin-1 -- decimal 128-255.
I believe your understanding is incorrect. Without being able to quote RFCs precisely, I think your understanding should be this: - the URL literal syntax only allows for ASCII characters - bytes with no meaning in ASCII can be quoted through %hh in URLs - the precise meaning of such bytes in the URL is defined in the URL scheme, and may vary from URL scheme to URL scheme - the http scheme does not specify any interpretation of the bytes, but apparantly assumes that they denote characters, and follow some encoding - which encoding is something that the web server defines, when mapping URLs to resources. If you get the impression that this is underspecified: your impression is correct; it is underspecified indeed. There is a recent attempt to tighten the specification through IRIs. The IRI RFC defines a mapping between IRIs and URIs, and it uses UTF-8 as the encoding, not latin-1. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list