Le jeudi 25 juillet 2013 22:45:38 UTC+2, Ian a écrit : > On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano > > <steve+comp.lang.pyt...@pearwood.info> wrote: > > > On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote: > > > > > >> On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano > > >> <steve+comp.lang.pyt...@pearwood.info> wrote: > > >>> On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote: > > >>>> "To conserve memory, Emacs does not hold fixed-length 22-bit numbers > > >>>> that are codepoints of text characters within buffers and strings. > > >>>> Rather, Emacs uses a variable-length internal representation of > > >>>> characters, that stores each character as a sequence of 1 to 5 8-bit > > >>>> bytes, depending on the magnitude of its codepoint[1]. For example, > > >>>> any ASCII character takes up only 1 byte, a Latin-1 character takes up > > >>>> 2 bytes, etc. We call this representation of text multibyte. > > >>> > > >>> Well, you've just proven what Vim users have always suspected: Emacs > > >>> doesn't really exist. > > >> > > >> ... lolwut? > > > > > > > > > JMF has explained that it is impossible, impossible I say!, to write an > > > editor using a flexible string representation. Since Emacs uses such a > > > flexible string representation, Emacs is impossible, and therefore Emacs > > > doesn't exist. > > > > > > QED. > > > > Except that the described representation used by Emacs is a variant of > > UTF-8, not an FSR. It doesn't have three different possible encodings > > for the letter 'a' depending on what other characters happen to be in > > the string. > > > > As I understand it, jfm would be perfectly happy if Python used UTF-8 > > (or presumably the Emacs variant) as its internal string > > representation.
------ And emacs it probably working smoothly. Your comment summarized all this stuff very correctly and very shortly. utf8/16/32? I do not care. There are all working correctly, smoothly and efficiently. In fact, these utf's are already doing correctly, what this FSR is doing in a wrong way. My preference? utf32. Why? It is the most simple and consequently performing choice. I'm not a narrow minded ascii user. (I do not pretend to belong to those who are solving the quadrature of the circle, I pretend to belong to those who know, the quadrature of the circle is not solvable). Note: text processing tools or tools that have to process characters — and the tools to build these tools — are all moving to utf32, if not already done. There are technical reasons behind this, which are going beyond the pure raw unicode. There are however still 100% Unicode compliant. jmf -- http://mail.python.org/mailman/listinfo/python-list