Le samedi 13 juillet 2013 01:13:47 UTC+2, Michael Torrie a écrit : > On 07/12/2013 09:59 AM, Joshua Landau wrote: > > > If you're interested, the basic of it is that strings now use a > > > variable number of bytes to encode their values depending on whether > > > values outside of the ASCII range and some other range are used, as an > > > optimisation. > > > > Variable number of bytes is a problematic way to saying it. UTF-8 is a > > variable-number-of-bytes encoding scheme where each character can be 1, > > 2, 4, or more bytes, depending on the unicode character. As you can > > imagine this sort of encoding scheme would be very slow to do slicing > > with (looking up a character at a certain position). Python uses > > fixed-width encoding schemes, so they preserve the O(n) lookup speeds, > > but python will use 1, 2, or 4 bytes per every character in the string, > > depending on what is needed. Just in case the OP might have > > misunderstood what you are saying. > > > > jmf sees the case where a string is promoted from one width to another, > > and thinks that the brief slowdown in string operations to accomplish > > this is a problem. In reality I have never seen anyone use the types of > > string operations his pseudo benchmarks use, and in general Python 3's > > string behavior is pretty fast. And apparently much more correct than > > if jmf's ideas of unicode were implemented.
------ Sorry, you are not understanding Unicode. What is a Unicode Transformation Format (UTF), what is the goal of a UTF and why it is important for an implementation to work with a UTF. Short example. Writing an editor with something like the FSR is simply impossible (properly). jmf -- http://mail.python.org/mailman/listinfo/python-list