On Wed, Aug 10, 2005 at 02:56:46PM +0200, Leopold Toetsch wrote: > Nicholas Clark via RT wrote: > > >I thought that one thing Jarkko learned from perl 5's Unicode model was > >that > >the amount of code and pain to support a variable length encoding was > >greater than the space saving that that encoding gives. > > > >In turn Dan had decided that Parrot should internally unpack to some form > >of fixed width encoding. So all Unicode would be stored internally in the > >shortest of ISO-8859-1, UCS-16 and UCS-32 that encompassed all the code > >points used. > > Yes, with the enhancenment (also proposed by Dan) that a conversion to > fixed width encoding is done lazily i.e. on demand. The substr would be > typically such a place to change encoding to fixed.
Aha. That's the subtly that I missed from all this. The form of the "fix" > >But having dealt with the fun of variable length encodings, my gut feeling > >is with Jarkko, that it's probably better to stay fixed width internally. > > My gut feeling is just the same. Thanks for the clarification. Nicholas Clark