[issue4757] reject unicode in zlib

Marc-Andre Lemburg Sat, 27 Dec 2008 05:13:22 -0800

Marc-Andre Lemburg <[email protected]> added the comment:

On 2008-12-27 13:58, STINNER Victor wrote:
> Python 2.x allows to encode any byte string (str) and ASCII unicode 
> string (unicode):
> 
> $ python
> Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>>> import zlib
>>>> zlib.compress('abc')
> "x\x9cKLJ\x06\x00\x02M\x01'"
>>>> zlib.compress(u'abc')
> "x\x9cKLJ\x06\x00\x02M\x01'"
>>>> zlib.compress(u'abc\xe9')
> ...
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' ...
> 
> I'm not sure that this behaviour was really wanted become the 
> decompress operation is not symetric (the result type is always byte 
> string):
> 
> $ python
> Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>>> import zlib
>>>> zlib.decompress("x\x9cKLJ\x06\x00\x02M\x01'")
> 'abc'
>


I don't see a problem with this. The fact that Python 2.x also
accepts Unicode ASCII strings where strings are normally expected
is intended to help with the migration to Unicode, so the above
is expected.

zlib itself doesn't care about whether the data to be encoded
is text or bytes.

In Python 3.x, it's probably better to use bytes throughout the
API.

----------
nosy: +lemburg

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue4757>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue4757] reject unicode in zlib

Reply via email to