On Sat, Jan 05, 2008 at 02:11:35AM -0800, chromatic wrote:
> On Saturday 05 January 2008 01:26:48 Patrick R. Michaud wrote:

> > I think it will still be worthwhile to investigate
> > converting strings into a fixed-width encoding of some sort
> > instead of performing scans on variable-width encodings.
> 
> Agreed... if we figure out our Unicode strategy.

Jarkko's view was that if he were doing Perl 5 Unicode again he would opt for
fixed width 32 bit rather than UTF-8, because a lot of algorithms,
particularly in regexps, assume linear random access.

Space wise, a better compromise, at only slightly more complexity
(vtables for accessors feel natural for this) is to go for fixed width,
smallest that will hold the largest Unicode code point in the string,
7 bit, 8 bit, 16 bit and 32 bit.

And no "nasty" UTF-8 (or UTF-16. The 16 bit would be UCS-2 - if you would have
needed surrogate pairs, switch to UTF-32)

Everything is still fixed width, with linear access times.

Nicholas Clark

Reply via email to