Michael Torrie <torr...@gmail.com>: > On 07/14/2017 07:31 AM, Marko Rauhamaa wrote: >> Of course, UTF-8 in a bytes object doesn't make the situation any >> better, but does it make it any worse? >> >> As it stands, we have >> >> รจ --[encode>-- Unicode --[reencode>-- UTF-8 >> >> Why is one encoding format better than the other? > > This is precisely the logic behind Google using UTF-8 for strings in > Go, rather than having some O(1) abstract type like Python has. And > many other languages do the same. The argument is that because of the > very issues that you mention, having O(1) lookup in a string isn't > that important, since looking up a particular index in a unicode > string is rarely the right thing to do, so UTF-8 is just fine as a > native, in-memory type.
It pays to come in late. Windows NT and Java evaded the 8-bit localization nightmare by going UCS-2. Python3 managed not to repeat the earlier UCS-2 blunders by going all the way to UCS-4. Go saw the futility of UCS-4 as a separate data type and dropped down to UTF-8. Unfortunately, Guile is following in Python3's footsteps. Marko -- https://mail.python.org/mailman/listinfo/python-list