[issue15278] UnicodeDecodeError when readline in codecs.py

Walter Dörwald Wed, 10 Oct 2012 02:49:37 -0700

Walter Dörwald added the comment:

> >>> codecs.utf_8_decode('\u20ac'.encode('utf8')[:2])
> ('', 0)
>
> Oh... codecs.CODEC_decode are incremental decoders? I misunderstood completly 
> this.


No, those function are not decoders, they're just helper functions used to 
implement the real incremental decoders. That's why they're undocumented.

Whether codecs.utf_8_decode() returns partial results or raises an exception 
depends on the final argument::

>>> s = '\u20ac'.encode('utf8')[:2]
>>> codecs.utf_8_decode(s, 'strict')
('', 0)
>>> codecs.utf_8_decode(s, 'strict', False)
('', 0)
>>> codecs.utf_8_decode(s, 'strict', True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: 
unexpected end of data

If you look at encodings/utf_8.py you see that the stateless decoder call 
codecs.utf_8_decode() with final==True::

    def decode(input, errors='strict'):
        return codecs.utf_8_decode(input, errors, True)

so the stateless decoder *will* raise exceptions for partial results. The 
incremental decoder simply passed on the final argument given to its encode() 
method.

----------
nosy: +doerwalter

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue15278>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue15278] UnicodeDecodeError when readline in codecs.py

Reply via email to