Hello, I'm trying to do urllib.urlencode() with unicode correctly, and I wonder if some kind person could set me straight?
My understanding is that I am supposed to be able to urlencode anything up to the top half of latin-1 -- decimal 128-255. I can't just send urlencode a unicode character: Python 2.3.5 (#2, May 4 2005, 08:51:39) [GCC 3.3.5 (Debian 1:3.3.5-12)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import urllib >>> s=u'abc'+unichr(246)+u'def' >>> dct={'x':s} >>> urllib.urlencode(dct) Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/lib/python2.3/urllib.py", line 1206, in urlencode v = quote_plus(str(v)) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 3: ordinal not in range(128) Is it instead Right that I should send a unicode string to urlencode by first encoding it to 'latin-1' ? >>> import urllib >>> s=u'abc'+unichr(246)+u'def' >>> dct={'x':s.encode('latin-1')} >>> urllib.urlencode(dct) 'x=abc%F6def' If it is Right, I'm puzzled as to why urlencode doesn't do it. Or am I missing something? urllib.ulrencode() contains the lines: elif _is_unicode(v): # is there a reasonable way to convert to ASCII? # encode generates a string, but "replace" or "ignore" # lose information and "strict" can raise UnicodeError v = quote_plus(v.encode("ASCII","replace")) l.append(k + '=' + v) so I think that it is *not* liking latin-1. Thank you, Jim -- http://mail.python.org/mailman/listinfo/python-list