Re: urllib.unquote and unicode

2006-12-22 Thread Martin v. Löwis
Duncan Booth schrieb: > So you believe that because something is only recommended by a standard > Python should refuse to implement it? Yes. In the face of ambiguity, refuse the temptation to guess. This is *deeply* ambiguous; people have been using all kinds of encodings in http URLs. > You do

Re: urllib.unquote and unicode

2006-12-22 Thread Duncan Booth
"Martin v. Löwis" <[EMAIL PROTECTED]> wrote: The way that uri encoding is supposed to work is that first the input string in unicode is encoded to UTF-8 and then each byte which is not in the permitted range for characters is encoded as % followed by two hex characters. >>> Ca

Re: urllib.unquote and unicode

2006-12-21 Thread Martin v. Löwis
>>> The way that uri encoding is supposed to work is that first the input >>> string in unicode is encoded to UTF-8 and then each byte which is not in >>> the permitted range for characters is encoded as % followed by two hex >>> characters. >> Can you back up this claim ("is supposed to work") by

Re: urllib.unquote and unicode

2006-12-21 Thread Walter Dörwald
Martin v. Löwis wrote: > Duncan Booth schrieb: >> The way that uri encoding is supposed to work is that first the input >> string in unicode is encoded to UTF-8 and then each byte which is not in >> the permitted range for characters is encoded as % followed by two hex >> characters. > > Can you

Re: urllib.unquote and unicode

2006-12-20 Thread Duncan Booth
"Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Duncan Booth schrieb: >> The way that uri encoding is supposed to work is that first the input >> string in unicode is encoded to UTF-8 and then each byte which is not >> in the permitted range for characters is encoded as % followed by two >> hex cha

Re: urllib.unquote and unicode

2006-12-19 Thread Martin v. Löwis
Duncan Booth schrieb: > The way that uri encoding is supposed to work is that first the input > string in unicode is encoded to UTF-8 and then each byte which is not in > the permitted range for characters is encoded as % followed by two hex > characters. Can you back up this claim ("is supposed

Re: urllib.unquote and unicode

2006-12-19 Thread George Sakkis
Fredrik Lundh wrote: > George Sakkis wrote: > > > The following snippet results in different outcome for (at least) the > > last three major releases: > > > import urllib > urllib.unquote(u'%94') > > > > # Python 2.3.4 > > u'%94' > > > > # Python 2.4.2 > > UnicodeDecodeError: 'ascii' code

Re: urllib.unquote and unicode

2006-12-19 Thread Duncan Booth
"Leo Kislov" <[EMAIL PROTECTED]> wrote: > George Sakkis wrote: >> The following snippet results in different outcome for (at least) the >> last three major releases: >> >> >>> import urllib >> >>> urllib.unquote(u'%94') >> >> # Python 2.3.4 >> u'%94' >> >> # Python 2.4.2 >> UnicodeDecodeError: 'as

Re: urllib.unquote and unicode

2006-12-19 Thread Fredrik Lundh
George Sakkis wrote: > The following snippet results in different outcome for (at least) the > last three major releases: > import urllib urllib.unquote(u'%94') > > # Python 2.3.4 > u'%94' > > # Python 2.4.2 > UnicodeDecodeError: 'ascii' codec can't decode byte 0x94 in position 0: > o

Re: urllib.unquote and unicode

2006-12-19 Thread Peter Otten
George Sakkis wrote: > The following snippet results in different outcome for (at least) the > last three major releases: > import urllib urllib.unquote(u'%94') > # Python 2.4.2 > UnicodeDecodeError: 'ascii' codec can't decode byte 0x94 in position 0: > ordinal not in range(128) Pytho

Re: urllib.unquote and unicode

2006-12-18 Thread Leo Kislov
George Sakkis wrote: > The following snippet results in different outcome for (at least) the > last three major releases: > > >>> import urllib > >>> urllib.unquote(u'%94') > > # Python 2.3.4 > u'%94' > > # Python 2.4.2 > UnicodeDecodeError: 'ascii' codec can't decode byte 0x94 in position 0: > or