On Tuesday 01 July 2008 09.56:29 Mattias Gaertner wrote: > On Tue, 01 Jul 2008 09:35:35 +0200 > > Luca Olivetti <[EMAIL PROTECTED]> wrote: > > OTOH using variable length characters will make string operations > > expensive (since you can't just multiply the index by 2 or 4 but you > > have to examine the string from the beginning, and the length in > > bytes isn't the same as the length in characters). > > It's amazing that this argument come up again and again. But I know > hardly any code that need this index to char mapping. And the code, > that need it is seldom time critical. > (I must admit, I feared the same some years ago. But the extra cost is > practically a myth.) > A good example is text layout calculation where it is necessary to iterate over characters (glyphs) over and over again. MSEgui uses widestrings directly, fpGUI converts to widestrings before processing (or use they the slow utf-8 routines ?). I once switched MSEgui to utf-8 because of the widestring problems in FPC, one or two months later when I implemented complex layout calculation with tabulators and justified text I switched back to widestrings... This belongs to a GUI framework, for a RTL are possibly other priorities.
> > Most code only needs the number of bytes. And this still cost under > pascal O(1). > In fact if a UTF8String or UTF16String would be added, then I would > say, it would be a waste of memory to store an extra PtrInt for the > number of characters. > Agreed. I think the best compromise for a GUI framework are referencecounted widestrings where normally physical index = code point index. If one needs characters which are not in the base plane, he must use surrogate pairs and more complicated and slower processing. I assume this will be seldom used. Martin _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal