Serhiy Storchaka <storch...@gmail.com> added the comment:

The problem is that "surrogatepass" specific to utf-8 and there is no standard 
way to decode alone surrogates in utf-16.

>>> "\udc80\udc80".encode("utf-16", "surrogatepass").decode("utf-16", 
>>> "surrogatepass")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 2-3: illegal 
encoding

With utf-32 this "works" only thanks to the bug -- utf-32 allows alone 
surrogates (issue #12892).

If the "surrogatepass" worked with utf-16 and utf-32, it would be natural to 
throw ValueError for other encodings.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13916>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to