RE: Should MY:: be a real symbol table?

Brent Dax Mon, 03 Sep 2001 16:56:54 -0700
# -----Original Message-----
# From: Dan Sugalski [mailto:[EMAIL PROTECTED]]
# Sent: Monday, September 03, 2001 4:31 PM
# To: Ken Fox; Brent Dax
# Cc: Simon Cozens; [EMAIL PROTECTED]
# Subject: Re: Should MY:: be a real symbol table?
#
# >Lexicals are fundamentally different from Perl's package
...
# No, actually, they're not.
#
# The big difference between lexical variables and package
# variables is that
# lexicals are looked up by stash offset and package variables
# are looked up
# by name. (Okay, there are a few minor details beyond that, but that's
# really the big one) There really isn't anything special about
# a stash. All
# it is is a hash perl thinks contains variable names. (And it has GVs
# instead of [SHA]Vs, but once again a trivial difference, and
# one going away
# for perl 6)
#
# The real question, as I see it, is "Should we look lexicals
# up by name?"
# And the answer is Yes. Larry's decreed it, and it makes sense. (I'm
# half-tempted to hack up something to let it be done in perl
# 5--wouldn't
# take much work)
#
# The less real question, "Should pads be hashes or arrays",
# can be answered
# by "whichever is ultimately cheaper". My bet is we'll
# probably keep the
# array structure with embedded names, and do a linear search
# for those rare
# times you're actually looking by name.

Yay, someone understood what I was saying!  :^)

As far as expensiveness, I think this can be just as fast as our current
offset-into-the-pad method.

If we allocate the stash at compile time (so the HEs don't change), we
can resolve lexicals down to the HE.  In essence, the HE would be
serving the job a GV does in Perl 5 for globals, or an offset does for
lexicals on array-of-array pads--indirection.  (Obviously this would be
in the fixup section in pre-compiled code.)

For those who don't understand my ravings:

        sub foo { my($bar, @baz); ... }

becomes:

        CV {
                refcount ----------------------> 1
                opcodes -----------------------> ...
                padstash ----------------------+
                ...                            |
        }                                      |
                                               |
        STASH { <----------------------------+
                HE (Hash Entry) { (0x1)
                        key ---------------------> '$bar'
                        value -------------------> SV *
                        ...
                }

                HE { (0x2)
                        key --------------------> '@baz'
                        value ------------------> SV *
                        ...
                }
                ...
        }

At compile-time, we can allocate and fill the stash.  Then, _still in
compile time_, we determine which HE will contain the value.  For
example, we know that the value slot of the hash entry at 0x1 will
contain the SV currently representing $bar.

Now, we can change the actual SV containing the current value of $bar at
will.  As long as the HE doesn't change, we're safe.

Since we're now looking up our variable names in a hash instead of an
array (remember, Perl hashes are only about 15% slower than arrays),
when we do have to look up a lexical at runtime we avoid an expensive
linear search.  (I don't know how the offsets are determined at
compile-time in Perl 5, but if they're also determined by a linear
search, we'll make compilation more efficient too.)

Obviously, the current array-of-array pads are more compact than a
stash; however, I doubt that will be a serious issue.

~~~~~~~~~~

As far as the temp() thing I mentioned earlier, compare these two pieces
of code:

        sub factorial {
                my($x)=shift;
                return 1 if($x==1);
                return $x * factorial($x-1);
        }

        sub factorial {
                temp($x)=shift;
                return 1 if($x==1);
                return $x * factorial($x-1);
        }

These subroutines recurse.  However, neither sub gets confused and tries
to modify another stack frame's $x.  In the second sub, *temp() is just
a mechanism to get a new $x*.  That's what I was talking about--I was
trying to draw an analogy between existing functionality and my
proposal.

If this point is still confusing, contact me privately and I can explain
it in more detail; if I get a bunch of requests I'll post it to the
group.

--Brent Dax
[EMAIL PROTECTED]

"...and if the answers are inadequate, the pumpqueen will be overthrown
in a bloody coup by programmers flinging dead Java programs over the
walls with a trebuchet."
RE: Should MY:: be a real symbol table?

Reply via email to