STINNER Victor added the comment:

> The surrogateescape error handler is dangerous with utf-16/32. It can produce 
> globally invalid output.

I don't understand, can you give an example? surrogateescape generate invalid 
encoded string with any encoding. Example with UTF-8:

>>> b"a\xffb".decode("utf-8", "surrogateescape")
'a\udcffb'

>>> 'a\udcffb'.encode("utf-8", "surrogateescape")
b'a\xffb'

>>> b'a\xffb'.decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid 
start byte

So str.encode("utf-8", "surrogateescape") produces an invalid UTF-8 sequence.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18713>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to