[issue7961] Py3k: decoding empty bytestring with invalid encoding throws no error

Marc-Andre Lemburg Fri, 19 Feb 2010 02:34:23 -0800

Marc-Andre Lemburg <m...@egenix.com> added the comment:

Ori Avtalion wrote:
> 
> Ori Avtalion <o...@avtalion.name> added the comment:
> 
> Ignoring the custom utf-8/latin-8 conversion functions, the actual checking 
> if a codec exists is done in Python/codecs.c's PyCodec_Decode.
> 
> Is that where I should move the aforementioned optimization to?


That's not a good idea, since codecs that are not used for decoding
into Unicode may very well return something other than an empty
string if passed an empty string on input, e.g.

'x\x9c\x03\x00\x00\x00\x00\x01'

> Is it safe to assume that the decoded object is always a string/bytestring?

No, that's not safe, esp. not in the codecs module. Codecs
can return arbitrary types. It's up to the codecs to decide
what type combinations they support.

In Python 3.x we check the types in the unicode.encode()/
bytes.decode() methods, but that's only a specific use case
in those helper methods, not a general limitation of the
codec architecture.

It's planned to add new .transform()/.untransform() methods
to unicode and bytes which will then provide access to
same-type codecs.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue7961>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7961] Py3k: decoding empty bytestring with invalid encoding throws no error

Reply via email to