On Tue, 01 Jul 2008 09:35:35 +0200 Luca Olivetti <[EMAIL PROTECTED]> wrote:
> En/na Marco van de Voort ha escrit: > >>> They have a UTF-16/UCS-2 internal representation, same as MSEgui > >>> which works very well and is fast and handy BTW. > >> And len, slicing, etc. work as expected. > >> Note that if you need characters beyond $ffff you have to compile > >> it with wide unicode support, and in that case every character > >> will use 4 bytes. > >> > > That's IMHO a faulty system. It requires you to choose between an > > incomplete solution or making strings a horrible memory hog. > > OTOH using variable length characters will make string operations > expensive (since you can't just multiply the index by 2 or 4 but you > have to examine the string from the beginning, and the length in > bytes isn't the same as the length in characters). It's amazing that this argument come up again and again. But I know hardly any code that need this index to char mapping. And the code, that need it is seldom time critical. (I must admit, I feared the same some years ago. But the extra cost is practically a myth.) > > But maybe that doesn't > > matter for mere scripting languages (though I wonder then why they > > didn't chose UTF-32 directly) > > > > Surrogates are not nice, but they were invented for a reason. > > Well, yes, they're a trade-off between performance and memory > consumption, but I fear we're losing one of the advantages that > pascal has over C: fast and simple string handling. Most code only needs the number of bytes. And this still cost under pascal O(1). In fact if a UTF8String or UTF16String would be added, then I would say, it would be a waste of memory to store an extra PtrInt for the number of characters. Mattias _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal