I'm still exploring the Rakudo build progress as a profiling target for likely optimizations. After this weekend's work, I have src/gen_actions.pir generation down to 27,788,055,796 instructions (with an optimized Parrot). A big chunk of that time goes to support bsr_ic:
7,784,136,854 core.ops:Parrot_bsr_ic 7,775,231,886 stacks.c:stack_push 7,763,569,145 stack_common.c:stack_prepare_push 7,754,735,042 stack_common.c:cst_new_stack_chunk (These times include calls from the functions, not their own times.) Why is it expensive? *Every* call to cst_new_stack_chunk() requests a free bufferlike object from the GC. 98% of the inclusive cost of these four functions is in running the GC. Someone who's familiar with the stack code (or wants to be) might be able to find a big optimization here. I won't rule out the possibility that these stack operations should be able to recycle freshly-unused stack chunks, to replenish the free list without doing a full GC run. Then again, I remember someone saying at least some parts of the stack code should go away, and I'm all for that too. -- c