Walter Dörwald added the comment: > >>> codecs.utf_8_decode('\u20ac'.encode('utf8')[:2]) > ('', 0) > > Oh... codecs.CODEC_decode are incremental decoders? I misunderstood completly > this.
No, those function are not decoders, they're just helper functions used to implement the real incremental decoders. That's why they're undocumented. Whether codecs.utf_8_decode() returns partial results or raises an exception depends on the final argument:: >>> s = '\u20ac'.encode('utf8')[:2] >>> codecs.utf_8_decode(s, 'strict') ('', 0) >>> codecs.utf_8_decode(s, 'strict', False) ('', 0) >>> codecs.utf_8_decode(s, 'strict', True) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: unexpected end of data If you look at encodings/utf_8.py you see that the stateless decoder call codecs.utf_8_decode() with final==True:: def decode(input, errors='strict'): return codecs.utf_8_decode(input, errors, True) so the stateless decoder *will* raise exceptions for partial results. The incremental decoder simply passed on the final argument given to its encode() method. ---------- nosy: +doerwalter _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue15278> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com