Chris Angelico <ros...@gmail.com>: > On Sun, Mar 8, 2015 at 2:48 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: >> Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>: >> >>> Marko Rauhamaa wrote: >>> >>>> That said, UTF-8 does suffer badly from its not being >>>> a bijective mapping. >>> >>> Can you explain? >> >> In Python terms, there are bytes objects b that don't satisfy: >> >> b.decode('utf-8').encode('utf-8') == b > > Please provide an example; that sounds like a bug. If there is any > invalid UTF-8 stream which decodes without an error, it is actually a > security bug, and should be fixed pronto in all affected and supported > versions.
Here's an example: b = b'\x80' Yes, it generates an exception. IOW, UTF-8 is not a bijective mapping from str objects to bytes objects. Marko -- https://mail.python.org/mailman/listinfo/python-list