Dan Sugalski wrote:1) Parrot will *not* require Unicode. Period. Ever.
My old 8MB Visor Prism thanks you.
:) As does my gameboy.
*) Transform stream of bytes to and from a set of 32-bit integers
*) Manages byte buffer (so buffer positioning and manipulation by code point offset is handled here)
What's wrong with, *as an internal optimization only*, storing the string in the more efficient-to-access format of the patch? I mean, yeah, you don't want it to be externally visible, but if you're going to treat a string as a series of ints, why not store it that way?
I really see no reason to store strings as UTF-{8,16,32} and waste CPU cycles on decoding it when we can do a lossless conversion to a format that's both more compact (in the most common cases) and faster.
Erm... UTF-32 is a fixed-width encoding. (That Unicode is inherently a variable-width character set is a separate issue, though given the scope of the project a correct decision) I'm fine with leaving ICU to store unicode data internally any damn way it wants, though--partly because the IBM folks are Darned Clever and I trust their judgement, and partly because it means we don't have to write all the code to properly handle Unicode.
Other variable-width encodings will likely be stored internally as fixed-width buffers, at least once the data gets manipulated some. Assuming I'm not convinced that Unicode is the true way to go... :)
--
Dan
--------------------------------------"it's like this"------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk