Re: Grapheme clusters, a.k.a.real characters

Marko Rauhamaa Fri, 14 Jul 2017 12:13:28 -0700

Michael Torrie <[email protected]>:

> On 07/14/2017 07:31 AM, Marko Rauhamaa wrote:
>> Of course, UTF-8 in a bytes object doesn't make the situation any
>> better, but does it make it any worse?
>> 
>> As it stands, we have
>> 
>>    è --[encode>-- Unicode --[reencode>-- UTF-8
>> 
>> Why is one encoding format better than the other?
>
> This is precisely the logic behind Google using UTF-8 for strings in
> Go, rather than having some O(1) abstract type like Python has. And
> many other languages do the same. The argument is that because of the
> very issues that you mention, having O(1) lookup in a string isn't
> that important, since looking up a particular index in a unicode
> string is rarely the right thing to do, so UTF-8 is just fine as a
> native, in-memory type.


It pays to come in late.

Windows NT and Java evaded the 8-bit localization nightmare by going
UCS-2.

Python3 managed not to repeat the earlier UCS-2 blunders by going all
the way to UCS-4.

Go saw the futility of UCS-4 as a separate data type and dropped down to
UTF-8.

Unfortunately, Guile is following in Python3's footsteps.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Grapheme clusters, a.k.a.real characters

Reply via email to