At 09:56 AM 2/9/2001 -0200, Branden wrote:
>Jarkko Hietaniemi wrote:
> > > Umm, one way or another I suspect UTF-8 will be in there.
> >
> > I suspect so too but very grudgingly.  As Dan said dealing with
> > variable length data is a major pain.  UTF-8 is certainly a much
> > better designed VLD than most but it's still a pain.
> >
>
>I guess that's why strings should be abstracted and only accessed by an API
>from everywhere outside the string API handling functions.

That would be one reason, yep.

>The string API should be sufficiently smart to be able to convert data from
>one encoding to another as it's more convenient.

No, the vtable functions for the variables should know how to convert from 
and to perl's preferred string representations, and can do whatever Bizarre 
Magic they care to iternally.

>On the other side, for a string that is matched against regexps, it doesn't
>matter much if it has variable character length, since regexps normally read
>all the string anyway, and indexing characters isn't much of a concern.

You underestimate the impact of variable-length data, I think. Regexes 
should go rather faster on fixed-length than variable length data. How much 
so depends on your processor. (I can guarantee that Alphas will run a 
darned sight faster on UTF-32 than UTF-8...)

>It would be nice if the user had some control to this, for example by saying
>"I don't care this string will be used by substr, leave it in UTF-8 since
>it's too big and I don't want to waste memory!", or "This string isn't too
>big, so I should convert it to bloated UTF-32 at once!", or even "use less
>'memory';".

That would be:

   my str $foo : utf8 : fixed;

or possibly

   use less qw(memory);

Generally speaking you probably don't want to do this. Odds are if you 
think you know what's going on better than the compiler, you're wrong. (Not 
always, but in a non-trivial number of cases, in my experience)

>And I believe 8-bit ASCII will always be an option, for who doesn't care
>about extended characters and want the best of both worlds on speed and
>memory usage.

8-bit characters in general, yep. (ASCII is really 7-bit) ASCII, EBCDIC, or 
raw byte buffers.

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to