Re: Another Optimization Target: bsr and stacks

Patrick R. Michaud Mon, 21 Apr 2008 12:26:33 -0700

On Mon, Apr 21, 2008 at 10:19:08AM -0700, chromatic wrote:
> I'm still exploring the Rakudo build progress as a profiling target for 
> likely 
> optimizations.  After this weekend's work, I have src/gen_actions.pir 
> generation down to 27,788,055,796 instructions (with an optimized Parrot).  A 
> big chunk of that time goes to support bsr_ic:
> 
>  7,784,136,854  core.ops:Parrot_bsr_ic
>  7,775,231,886  stacks.c:stack_push
>  7,763,569,145  stack_common.c:stack_prepare_push
>  7,754,735,042  stack_common.c:cst_new_stack_chunk
> 
> Why is it expensive?  *Every* call to cst_new_stack_chunk() requests a free 
> bufferlike object from the GC.  98% of the inclusive cost of these four 
> functions is in running the GC.
>
> Someone who's familiar with the stack code (or wants to be) might be able to 
> find a big optimization here.


To me, the scary part of src/stacks.c is at the beginning:

    The stack is stored as a linked list of chunks (C<Stack_Chunk>),
    where each chunk has room for one entry.

Eek!  For something like bsr_ic, which is really just pushing a
return address (i.e., opcode_t *) onto a stack, we're allocating a
new GC-able object for every bsr invocation, and freeing it on ret.
Since PGE uses bsr/ret for its backtracking, that's a lot of allocations.

In fact, this seems to be the case for everything using the
"generic stack", which AFAICT is the &interp->dynamic_env structure.
So, everything that gets pushed onto this stack (exceptions,
continuations, coroutines, bsr/ret calls) ends up with a separate
gc-able structure (Stack_Chunk) to hold the stack entry.
So, switching PGE from bsr/ret to another control structure doesn't
give us a win here.

I think we'd get a BIG win if we changed the dynamic_env stack to
have an approach similar to ResizableIntegerArray, where we allocate
arrays of Stack_Chunk entries and manage them that way, instead of
a separate allocation per element in the stack.

In looking at the stacks.c code I think I could manage this
if nobody else is interested in playing with it.

> Then again, I remember someone saying at least some parts of the stack code 
> should go away, and I'm all for that too.

The parts of the stack code we've discussed eliminating is actually 
the user_stack, used for push/pop/saveall/restoreall opcodes.  The
bsr/ret opcodes don't use this particular stack.  However, I think that
if we eliminate the user_stack, we in fact simplify stacks.c by
a tremendous amount and could get even a bigger win (including eliminating
some unions and other C nastiness).

I think it's a big enough win for PGE and Rakudo that I'll be very
happy to work on both of these items in a branch -- i.e., 
eliminating the user stack ops and converting the remaining
dynamic_env structure to be something far more suitable for
GC.  I think the performance gain we'll see will be, well, substantial.

Pm

Re: Another Optimization Target: bsr and stacks

Reply via email to