On Sun, Mar 8, 2015 at 3:25 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: > Chris Angelico <ros...@gmail.com>: > >> On Sun, Mar 8, 2015 at 2:48 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: >>> Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>: >>> >>>> Marko Rauhamaa wrote: >>>> >>>>> That said, UTF-8 does suffer badly from its not being >>>>> a bijective mapping. >>>> >>>> Can you explain? >>> >>> In Python terms, there are bytes objects b that don't satisfy: >>> >>> b.decode('utf-8').encode('utf-8') == b >> >> Please provide an example; that sounds like a bug. If there is any >> invalid UTF-8 stream which decodes without an error, it is actually a >> security bug, and should be fixed pronto in all affected and supported >> versions. > > Here's an example: > > b = b'\x80' > > Yes, it generates an exception. IOW, UTF-8 is not a bijective mapping > from str objects to bytes objects.
That's not the same as what you said. All you've proven is that there are bit patterns which are not UTF-8 streams... which is a very deliberate feature. How does UTF-8 *suffer* from this? It benefits hugely! ChrisA -- https://mail.python.org/mailman/listinfo/python-list