Chris Angelico <ros...@gmail.com>: > On Sun, Mar 8, 2015 at 3:25 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: >>>>> Marko Rauhamaa wrote: >>>>>> That said, UTF-8 does suffer badly from its not being >>>>>> a bijective mapping. >>>>> >> Here's an example: >> >> b = b'\x80' >> >> Yes, it generates an exception. IOW, UTF-8 is not a bijective mapping >> from str objects to bytes objects. > > That's not the same as what you said.
Except that it's precisely what I said. > All you've proven is that there are bit patterns which are not UTF-8 > streams... And that causes problems. > which is a very deliberate feature. Well, nobody desired it. It was just something that had to give. I believe you *could* have defined it as a bijective mapping but then you would have lost the sorting order correspondence. > How does UTF-8 *suffer* from this? It benefits hugely! You can't operate on file names and text files using Python strings. Or at least, you will need to add (nontrivial) exception catching logic. Marko -- https://mail.python.org/mailman/listinfo/python-list