Dan Sugalski writes:
: I'm not sure that raw's the right word, given that the data is really
: Unicode. It's not raw in the sense that a JPEG image or executable is raw data.
I'm suggesting it might be raw in that very sense, and simultaneously
be perfectly valid "internal" Unicode. Otherwise you couldn't "slurp"
it. To me, "raw" means "I know exactly what I'm doing, so keep your
cotton-picken' fingers off it until I tell you to put your cotton-picken'
fingers on it."
: I'm half-tempted to implement a 'touch count' in the scalars somewhere to
: track the number of times something's been dealt with in a non-native way
: to use as an indicator of whether we should just up and convert things. I
: can't shake off the feeling that it'll be more expensive than not doing it,
: though.
My feelings agree with your feelings there. My guess is we have to
glue a big switch on the side of something so the programmer can tell
us whether to be pessimistic or optimistic. My guess is we want to
attach that big switch to each of the input stacks, not to the current
lexical scope, since in a single scope there may be several data paths,
determined primarily by where the data came from. (I expect that
attaching such a big switch to each variable would be overkill, but
some people seem to like overkill.)
Remember also that the scalability of Perl will depend on allowing
different policy decisions on this matter. A tiny, slow Perl would
probably force everything to one representation immediately. A large,
fast Perl might have code to do regex matching in anything from Big-5
to KOI-8. (So it behooves us to write the basic regex algorithm in an
encoding-agnostic form, and then find some way to efficiently tie that
to particular encodings. Indeed, the regex engine itself had better
be easily portable to Java and C#.)
It doesn't matter how fast the CPU or how big the memory--I think we'll
always need to be able to trade CPU and memory off for each other. Part
of the endearing quality of Perl 5 is that it generally tries to outsmart
the programmer at every turn on this issue, but that might not be the
best approach in the long run. "use less" wasn't intended to be useless.
Larry