On Friday, November 15, 2013 9:43:17 AM UTC-5, Robin Becker wrote:
> Things went wrong when utf8 was not adopted as the standard encoding thus 
> requiring two string types, it would have been easier to have a len function 
> to 
> count bytes as before and a glyphlen to count glyphs. Now as I understand it 
> we 
> have a complicated mess under the hood for unicode objects so they have a 
> variable representation to approximate an 8 bit representation when suitable 
> etc 
> etc etc.
> 

Dealing with bytes and Unicode is complicated, and the 2->3 transition is not 
easy, but let's please not spread the misunderstanding that somehow the 
Flexible String Representation is at fault.  However you store Unicode code 
points, they are different than bytes, and it is complex having to deal with 
both.  You can't somehow make the dichotomy go away, you can only choose where 
you want to think about it.

--Ned.

> -- 
> Robin Becker

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to