New submission from Serhiy Storchaka:

An error handler in unicode_escape_decode() eats at least one byte (or more) 
after illegal escape sequence.

>>> import codecs
>>> codecs.unicode_escape_decode(br'\u!@#', 'replace')
('�', 5)
>>> codecs.unicode_escape_decode(br'\u!@#$', 'replace')
('�@#$', 6)

raw_unicode_escape_decode() works right:

>>> codecs.raw_unicode_escape_decode(br'\u!@#', 'replace')
('�!@#', 5)
>>> codecs.raw_unicode_escape_decode(br'\u!@#$', 'replace')
('�!@#$', 6)

See also issue16975.

----------
assignee: serhiy.storchaka
components: Unicode
messages: 180077
nosy: ezio.melotti, serhiy.storchaka
priority: normal
severity: normal
stage: needs patch
status: open
title: Broken error handling in codecs.unicode_escape_decode()
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16979>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to