Fredrik Lundh wrote: > George Sakkis wrote: > > > The following snippet results in different outcome for (at least) the > > last three major releases: > > > >>>> import urllib > >>>> urllib.unquote(u'%94') > > > > # Python 2.3.4 > > u'%94' > > > > # Python 2.4.2 > > UnicodeDecodeError: 'ascii' codec can't decode byte 0x94 in position 0: > > ordinal not in range(128) > > > > # Python 2.5 > > u'\x94' > > > > Is the current version the "right" one or is this function supposed to > > change every other week ? > > why are you passing non-ASCII Unicode strings to a function designed for > fixing up 8-bit strings in the first place? if you do proper encoding > before you quote things, it'll work the same way in all Python releases.
I'm using BeautifulSoup, which from version 3 returns Unicode only, and I stumbled on a page with such bogus char encodings; I have the impression that whatever generated it used ord() to encode reserved characters instead of the proper hex representation in latin-1. If that's the case, unquote() won't do anyway and I'd have to go with chr() on the number part. George -- http://mail.python.org/mailman/listinfo/python-list