Didn't get much response last time I came up with this. But as we really have to improve calling speed, here is again the original text. Below are some additional remarks.

-------- Original Message --------
Subject: Register stacks again
Date: Sat, 08 May 2004 13:29:20 +0200
From: Leopold Toetsch <[EMAIL PROTECTED]>
To: Perl 6 Internals <[EMAIL PROTECTED]>

For reference Dan's blog entry at:
http://www.sidhe.org/~dan/blog/archives/000321.html

I think that the situation has changed again and that we should consider
the original register in the stack scheme.

1) In the presence of MMD and delegate functions, object acessor methods
and OO in general, we'll probably do a lot more function calls, then the
assumptions stated in the article.

2) Hardware CPU speeds increase much faster then RAM access timings. The
relative cost of one extra pointer indirection is decreasing steadily.

3) JIT can execute as fast as now, when we reserve one register as the
stack frame pointer. JITs speed isn't so much gained by the absolute
address access, but by avoiding any memory access. That is: on the begin
of a basic block fill CPU registers (mov mem, %reg), then execute as
much as possible just in registers and at the end of a basic block store
 hardware regs back to Parrot's. The "mov mem, %reg" or "mov %reg, mem"
operations are the expensive part of it (2 cycles instead of 1 mostly).

So a register fetch could look like "mov x(%ebp), %reg", where x is the
offset in the register array and %ebp the register frame pointer. [1]
This instruction has the same execution speed as the absolute memorx access.

4) The absolute addressing scheme in prederefed and JIT run cores
implies a separate "compilation" (predereferencing or JITting) of the
code per thread. This has additionally some memory costs and possible
negative cache effects when different threads execute different code.

[1] yes I'm thinking of just one frame pointer :) Well, the whole scheme
makes a stack machine out of Parrot:

I0,...In, In+1....Ik, Ik+1...Il
^
|
frame
pointer

I0..In are the incoming function arguments
In+1..Ik are the working registers of the function
Ik+1...Il are outgoing args for a function call
Calling a function is just a move of the frame pointer to "Ik".
The working register file size is adjusted to the actual needs of the
function.

Actually registers would be arranged (I0,S0,P0,N0,I1,S1...)

Is this scheme just crazy?
leo

------------------
ad 3), 4) all prederefed cores and the JIT run core now use absolute register addresses. This currently needs reJITing and repredereferencing for each new thread. With an addressing scheme relative to *one* base or frame pointer, this isn't necessary anymore - again at the expense of one CPU hardware register.
When the emitted code for a CPU register access is like:


  Rx := base_pointer[const_offset]

there is very likely no measurable slowdown in these run loops. The base_pointer needs only reloading after an C<invoke> statement. Please note that the normal function core does the same addressing with the interpreter pointer.

Eventually, given a fairly big linear stack, we could actually address as many registers as a function is using. Incoming registers could be addressed with a negative offset. Outgoing (or spilled) registers could be used with bigger positive offsets (beyond the actual working range of registers).

Comments still welcome ;)
leo



Reply via email to