On Mon, Apr 21, 2008 at 10:19:08AM -0700, chromatic wrote: > I'm still exploring the Rakudo build progress as a profiling target for > likely > optimizations. After this weekend's work, I have src/gen_actions.pir > generation down to 27,788,055,796 instructions (with an optimized Parrot). A > big chunk of that time goes to support bsr_ic: > > 7,784,136,854 core.ops:Parrot_bsr_ic > 7,775,231,886 stacks.c:stack_push > 7,763,569,145 stack_common.c:stack_prepare_push > 7,754,735,042 stack_common.c:cst_new_stack_chunk > > Why is it expensive? *Every* call to cst_new_stack_chunk() requests a free > bufferlike object from the GC. 98% of the inclusive cost of these four > functions is in running the GC. > > Someone who's familiar with the stack code (or wants to be) might be able to > find a big optimization here.
To me, the scary part of src/stacks.c is at the beginning: The stack is stored as a linked list of chunks (C<Stack_Chunk>), where each chunk has room for one entry. Eek! For something like bsr_ic, which is really just pushing a return address (i.e., opcode_t *) onto a stack, we're allocating a new GC-able object for every bsr invocation, and freeing it on ret. Since PGE uses bsr/ret for its backtracking, that's a lot of allocations. In fact, this seems to be the case for everything using the "generic stack", which AFAICT is the &interp->dynamic_env structure. So, everything that gets pushed onto this stack (exceptions, continuations, coroutines, bsr/ret calls) ends up with a separate gc-able structure (Stack_Chunk) to hold the stack entry. So, switching PGE from bsr/ret to another control structure doesn't give us a win here. I think we'd get a BIG win if we changed the dynamic_env stack to have an approach similar to ResizableIntegerArray, where we allocate arrays of Stack_Chunk entries and manage them that way, instead of a separate allocation per element in the stack. In looking at the stacks.c code I think I could manage this if nobody else is interested in playing with it. > Then again, I remember someone saying at least some parts of the stack code > should go away, and I'm all for that too. The parts of the stack code we've discussed eliminating is actually the user_stack, used for push/pop/saveall/restoreall opcodes. The bsr/ret opcodes don't use this particular stack. However, I think that if we eliminate the user_stack, we in fact simplify stacks.c by a tremendous amount and could get even a bigger win (including eliminating some unions and other C nastiness). I think it's a big enough win for PGE and Rakudo that I'll be very happy to work on both of these items in a branch -- i.e., eliminating the user stack ops and converting the remaining dynamic_env structure to be something far more suitable for GC. I think the performance gain we'll see will be, well, substantial. Pm