Re: Subroutines...

Andrew J Bromage Mon, 29 Apr 2002 16:58:21 -0700

G'day all.

On Mon, Apr 29, 2002 at 10:53:40AM -0400, Dan Sugalski wrote:


> Welcome to my world. They're all bad ideas in some way or, rather, 
> they're all equally as good.

That's definitely true. :-)

I should also point out that for Perl, all this is moot.  Current Perl
semantics require that everything be stored in a stash across calls
because the stash is accessible by other means.  I analyzed the
relevant docs closely enough, but I expect the rules for Perl 6 to be
similar.

So the caller-save vs callee-save problem is probably only relevant for
other languages.

> Yup, caller save doesn't necessarily favor code like this. On the 
> other hand, it does favor recursive code and code with heavier-weight 
> bodies.

Not quite.

If the caller and callee are of equal "weight" (e.g. recursive code),
it won't make a difference, because about the same number of registers
will be saved.  The only difference is who saves them.

Heavy-weight subs which call light-weight subs (the OO leaf method
example) get better treatment with caller-save because the callee
doesn't have much to save.  Light-weight subs which call heavy-weight
subs get better treatment with callee-save because the caller doesn't
have much to save.

> >Maybe half a dozen registers used in each one at the most.  A caller is
> >obliged to save _all_ its registers in order to call these trivial
> >subs.
> 
> Nope, just the registers it needs preserved across calls.

....which in languages other than Perl, should be a lot for any
non-trivial sub.  In real machines (as opposed to virtual machines), 
compilers go to a lot of trouble to use registers if they can.  This
won't change with a virtual machine because compiler writers expect
that if they store something in a register, it will be in L1 cache most
of the time, which is as close to being a "real" register as you can
get in interpreted code.

> Assuming that the caller and callee spend about the same amount of 
> time saving data (An assumption that's a bit dodgy there, as it's 
> based on the coding style the language encourages) caller-save is 
> ultimately a bit less expensive, as making tail calls avoids a 
> push/pop pair, which callee-save can't avoid.

Tail calls are an interesting case.  If you make a tail call, you don't
care what the registers contain because you don't want control back
after the call.  Same goes for continuations.

The problem is when you make a normal call to a sub which ends with a
tail call.  There are several solutions.

One is to save registers during the sub prologue and then restore them
just before the tail call; the push/pop pair which you note.  This does
have the effect of re-saving and re-restoring the same registers at
various points along the chain of tail calls.  You can optimize tail
recursion, or tail calls between code in the same module, of course,
but in general you need to do multiple saves and restores.

Another is Appel and Shao's method, which in essence passes N extra
arguments around each continuation where N is the number of callee-save
registers:

        http://flint.cs.yale.edu/flint/publications/callee.html

They've noticed significant speedups to be gained in supporting a
smallish number of callee-save registers on real CPUs.  (This may or
may not translate to speedups in Parrot, because Appel's ML compiler
uses continuation-passing style primarily; "normal" subs are not as
common and loops are non-existent.)

Yet another is to decide that tail calls and continuation calls are
different from calling "real" subs.  You can do this by calling
wrapper subs which save registers, calls the continuation or sub
which does tail calls, then restores when it gets control back.
Inlining the wrapper sub gives you the effect of caller-save registers,
only the caller doesn't have to worry about them.

I'm leaning towards the third solution, using wrapper subs, because
it's a more general mechanism which applies to situations other than
register saving.  Consider exploiting strictness analysis in Haskell,
for example.  If you know that some function always evaluates some
argument, then you could make the function assume that the argument is 
valuated, then implement a small wrapper function which evaluates the
argument then passes control to the optimized version.

Cheers,
Andrew Bromage

Re: Subroutines...

Reply via email to