On 2 avr, 10:03, Chris Angelico <ros...@gmail.com> wrote: > On Tue, Apr 2, 2013 at 6:24 PM, jmfauth <wxjmfa...@gmail.com> wrote: > > An editor may reflect very well the example a gave. You enter > > thousand ascii chars, then - boum - as you enter a non ascii > > char, your editor (assuming is uses a mechanism like the FSR), > > has to internally reencode everything! > > That assumes that the editor stores the entire buffer as a single > Python string. Frankly, I think this unlikely; the nature of > insertions and deletions makes this impractical. (I've known editors > that do function this way. They're utterly unusable on large files.) > > ChrisA
-------- No, no, no, no, ... as we say in French (this is a kindly form). The length of a string may have its importance. This bad behaviour may happen on every char. The most complicated chars are the chars with diacritics and ligatured [1, 2] chars, eg chars used in Arabic script [2]. It is somehow funny to see, the FSR "fails" precisely on problems Unicode will solve/handle, eg normalization or sorting [3]. No really a problem for those you are endorsing the good work Unicode does [5]. [1] A point which was not, in my mind, very well understood when I read the PEP393 discussion. [2] Take a unicode "TeX" compliant engine and toy with the decomposed form of these chars. A very good way, to understand what can be really a char, when you wish to process text "seriously". [3] I only test and tested these "chars" blindly with the help of the doc I have. Btw, when I test complicated "Arabic chars", I noticed, Py33 "crashes", it does not really crash, it get stucked in some king of infinite loop (or is it due to "timeit"?). [4] Am I the only one who test this kind of stuff? [5] Unicode is a fascinating construction. jmf -- http://mail.python.org/mailman/listinfo/python-list