> > - decode > > - encode > > - decode > > These steps aren't as slow as you might think.
I already have said in another mail that they are not a bottleneck, and we are not going to increment the performance of st. It's only about make the code better. > Look at how ??decoding?? is done. If you really think this slows down st > then utf8len can be optimized further. Decoding isn???t as heavy as you > think in comparison to the 32 bit burden you add and its illogic. Sorry, but I don't understand whay you mean here. > > I think we should decode the utf8 character in the input, store it > > in raw unicode with 4 bytes, and encode again in output (usually in > > getsel or in printer functions). The memory usage is going to be the > > same, because we store the utf8 string with 'char c[UTF_SIZ]', where > > UTF_SIZE is 4 (although it should be bigger because if we accept > > unicode of 32 bits then we can receive utf8 strings of 6 bytes). > > This is exactly the reason why st keeps this internal representation: to > adapt to future expansions of UTF???8, no matter what any crippled stan??? > dard says. If you adapt to a dynamically growing bytes per char string > you end up with a meta format like UTF???8 too. If we only decode in one place, and encode in one place, then the adaptation to new standards is only to modify a typedef and the encode/decode routines (which should be modified anyway). > As said, if you think utf8len should be optimized, look at [0]. This is a modification we can apply, of course. Although, I think if we move the representation to UTF32 we are not going to need utf8len, because the length of the utf8 character is going to be calculated in the conversion. > Another question arises from the st UTF???8 support: Who will implement > the normalisation? Will it be included in the new internal string repre??? > sentation? Now this question can be easily answered because st keeps the > raw representation. Helper functions take care of it, when it???s needed. I was thinking to take the value that utf8decode generates, that in this moment is the value we use as utf8 string, not the original, due to the decode/encode pair before of calling tputc. > UTF???32 (UTF???16 is a joke) is a disease, fight it. This was my idea, and I don't see what are the problems of UTF-32 here, please let me know them. Greetings, -- Roberto E. Vargas Caballero
