Marc-Andre Lemburg <m...@egenix.com> added the comment: Ori Avtalion wrote: > > Ori Avtalion <o...@avtalion.name> added the comment: > > Ignoring the custom utf-8/latin-8 conversion functions, the actual checking > if a codec exists is done in Python/codecs.c's PyCodec_Decode. > > Is that where I should move the aforementioned optimization to?
That's not a good idea, since codecs that are not used for decoding into Unicode may very well return something other than an empty string if passed an empty string on input, e.g. 'x\x9c\x03\x00\x00\x00\x00\x01' > Is it safe to assume that the decoded object is always a string/bytestring? No, that's not safe, esp. not in the codecs module. Codecs can return arbitrary types. It's up to the codecs to decide what type combinations they support. In Python 3.x we check the types in the unicode.encode()/ bytes.decode() methods, but that's only a specific use case in those helper methods, not a general limitation of the codec architecture. It's planned to add new .transform()/.untransform() methods to unicode and bytes which will then provide access to same-type codecs. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue7961> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com