Re: Newbie question about text encoding

Marko Rauhamaa Sat, 07 Mar 2015 08:57:32 -0800

Chris Angelico <ros...@gmail.com>:

> On Sun, Mar 8, 2015 at 3:25 AM, Marko Rauhamaa <ma...@pacujo.net> wrote:
>>>>> Marko Rauhamaa wrote:
>>>>>> That said, UTF-8 does suffer badly from its not being
>>>>>> a bijective mapping.
>>>>>
>> Here's an example:
>>
>>    b = b'\x80'
>>
>> Yes, it generates an exception. IOW, UTF-8 is not a bijective mapping
>> from str objects to bytes objects.
>
> That's not the same as what you said.


Except that it's precisely what I said.

> All you've proven is that there are bit patterns which are not UTF-8
> streams...

And that causes problems.

> which is a very deliberate feature.

Well, nobody desired it. It was just something that had to give.

I believe you *could* have defined it as a bijective mapping but then
you would have lost the sorting order correspondence.

> How does UTF-8 *suffer* from this? It benefits hugely!

You can't operate on file names and text files using Python strings. Or
at least, you will need to add (nontrivial) exception catching logic.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

Reply via email to