G'day all. On Mon, Apr 29, 2002 at 10:53:40AM -0400, Dan Sugalski wrote:
> Welcome to my world. They're all bad ideas in some way or, rather, > they're all equally as good. That's definitely true. :-) I should also point out that for Perl, all this is moot. Current Perl semantics require that everything be stored in a stash across calls because the stash is accessible by other means. I analyzed the relevant docs closely enough, but I expect the rules for Perl 6 to be similar. So the caller-save vs callee-save problem is probably only relevant for other languages. > Yup, caller save doesn't necessarily favor code like this. On the > other hand, it does favor recursive code and code with heavier-weight > bodies. Not quite. If the caller and callee are of equal "weight" (e.g. recursive code), it won't make a difference, because about the same number of registers will be saved. The only difference is who saves them. Heavy-weight subs which call light-weight subs (the OO leaf method example) get better treatment with caller-save because the callee doesn't have much to save. Light-weight subs which call heavy-weight subs get better treatment with callee-save because the caller doesn't have much to save. > >Maybe half a dozen registers used in each one at the most. A caller is > >obliged to save _all_ its registers in order to call these trivial > >subs. > > Nope, just the registers it needs preserved across calls. ....which in languages other than Perl, should be a lot for any non-trivial sub. In real machines (as opposed to virtual machines), compilers go to a lot of trouble to use registers if they can. This won't change with a virtual machine because compiler writers expect that if they store something in a register, it will be in L1 cache most of the time, which is as close to being a "real" register as you can get in interpreted code. > Assuming that the caller and callee spend about the same amount of > time saving data (An assumption that's a bit dodgy there, as it's > based on the coding style the language encourages) caller-save is > ultimately a bit less expensive, as making tail calls avoids a > push/pop pair, which callee-save can't avoid. Tail calls are an interesting case. If you make a tail call, you don't care what the registers contain because you don't want control back after the call. Same goes for continuations. The problem is when you make a normal call to a sub which ends with a tail call. There are several solutions. One is to save registers during the sub prologue and then restore them just before the tail call; the push/pop pair which you note. This does have the effect of re-saving and re-restoring the same registers at various points along the chain of tail calls. You can optimize tail recursion, or tail calls between code in the same module, of course, but in general you need to do multiple saves and restores. Another is Appel and Shao's method, which in essence passes N extra arguments around each continuation where N is the number of callee-save registers: http://flint.cs.yale.edu/flint/publications/callee.html They've noticed significant speedups to be gained in supporting a smallish number of callee-save registers on real CPUs. (This may or may not translate to speedups in Parrot, because Appel's ML compiler uses continuation-passing style primarily; "normal" subs are not as common and loops are non-existent.) Yet another is to decide that tail calls and continuation calls are different from calling "real" subs. You can do this by calling wrapper subs which save registers, calls the continuation or sub which does tail calls, then restores when it gets control back. Inlining the wrapper sub gives you the effect of caller-save registers, only the caller doesn't have to worry about them. I'm leaning towards the third solution, using wrapper subs, because it's a more general mechanism which applies to situations other than register saving. Consider exploiting strictness analysis in Haskell, for example. If you know that some function always evaluates some argument, then you could make the function assume that the argument is valuated, then implement a small wrapper function which evaluates the argument then passes control to the optimized version. Cheers, Andrew Bromage