Re: Plans for string processing

Dan Sugalski Tue, 13 Apr 2004 12:55:37 -0700

At 12:44 PM -0700 4/13/04, Brent 'Dax' Royal-Gordon wrote:

Dan Sugalski wrote:
1) Parrot will *not* require Unicode. Period. Ever.
My old 8MB Visor Prism thanks you.

:) As does my gameboy.

*) Transform stream of bytes to and from a set of 32-bit integers *) Manages byte buffer (so buffer positioning and manipulation by code point offset is handled here)
What's wrong with, *as an internal optimization only*, storing the string in the more efficient-to-access format of the patch? I mean, yeah, you don't want it to be externally visible, but if you're going to treat a string as a series of ints, why not store it that way?

I really see no reason to store strings as UTF-{8,16,32} and waste CPU cycles on decoding it when we can do a lossless conversion to a format that's both more compact (in the most common cases) and faster.

Erm... UTF-32 is a fixed-width encoding. (That Unicode is inherently a variable-width character set is a separate issue, though given the scope of the project a correct decision) I'm fine with leaving ICU to store unicode data internally any damn way it wants, though--partly because the IBM folks are Darned Clever and I trust their judgement, and partly because it means we don't have to write all the code to properly handle Unicode.

Other variable-width encodings will likely be stored internally as fixed-width buffers, at least once the data gets manipulated some. Assuming I'm not convinced that Unicode is the true way to go... :) -- Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Re: Plans for string processing

Reply via email to