Re: Grapheme clusters, a.k.a.real characters

Michael Torrie Fri, 14 Jul 2017 07:33:54 -0700

On 07/14/2017 07:31 AM, Marko Rauhamaa wrote:
> Of course, UTF-8 in a bytes object doesn't make the situation any
> better, but does it make it any worse?


> 
> As it stands, we have
> 
>    è --[encode>-- Unicode --[reencode>-- UTF-8
> 
> Why is one encoding format better than the other?

This is precisely the logic behind Google using UTF-8 for strings in Go,
rather than having some O(1) abstract type like Python has.  And many
other languages do the same.  The argument is that because of the very
issues that you mention, having O(1) lookup in a string isn't that
important, since looking up a particular index in a unicode string is
rarely the right thing to do, so UTF-8 is just fine as a native,
in-memory type.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Grapheme clusters, a.k.a.real characters

Reply via email to