On 09/09/2013 08:28 AM, wxjmfa...@gmail.com wrote: > Comment: Such differences never happen with utf.
But with utf, slicing strings is O(n) (well that's a simplification as someone showed an algorithm that is log n), whereas a fixed-width encoding (Latin-1, UCS-2, UCS-4) is O(1). Do you understand what this means? > Complicate and full of side effects, eg : > >>>> sys.getsizeof('a') > 26 >>>> sys.getsizeof('aé') > 39 Why on earth are you doing getsizeof? What are you expecting to prove? Why are you even trying to concern yourself with implementation details? As a programmer you should deal with unicode. Period. All you should care about is that you can properly index or slice a unicode string and that unicode strings can be operated on at a reasonable speed. IE string[4] should give you the character at position 4. len(string) should return the length of the string in *characters*. The byte encoding used behind the scenes is of no consequence other than speed (and you have not shown any problem with speed). > > Is not a latin-1 "é" supposed to count as a latin-1 "a" ? Of course it does. 'aé'[0] == 'a' and 'aé'[1] == 'é'. len('aé') returns 2. > I picked up random methods, there may be variations, basically > this general behaviour is always expected. Eh? Can you point to something in the unicode spec that doesn't work? I don't even know that much about unicode yet it's clear you're either deliberately muddying the waters with your stupid and pointless arguments against FCS or you don't really understand the difference between unicode and byte encoding. Which is it? -- https://mail.python.org/mailman/listinfo/python-list