On 07/03/2015 16:25, Marko Rauhamaa wrote:
Chris Angelico <ros...@gmail.com>:
On Sun, Mar 8, 2015 at 2:48 AM, Marko Rauhamaa <ma...@pacujo.net> wrote:
Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>:
Marko Rauhamaa wrote:
That said, UTF-8 does suffer badly from its not being
a bijective mapping.
Can you explain?
In Python terms, there are bytes objects b that don't satisfy:
b.decode('utf-8').encode('utf-8') == b
Please provide an example; that sounds like a bug. If there is any
invalid UTF-8 stream which decodes without an error, it is actually a
security bug, and should be fixed pronto in all affected and supported
versions.
Here's an example:
b = b'\x80'
Yes, it generates an exception. IOW, UTF-8 is not a bijective mapping
from str objects to bytes objects.
Python 2 might, Python 3 doesn't.
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
--
https://mail.python.org/mailman/listinfo/python-list