Some additional remarks: The "memset" in smallobject.c is not necessary on Linux. mmap() (which obviously gets called for memalign - at least for this arena size) does clear the memory. We need some tests, from which size memory is cleard for malloc and memalign. I tossed the memset for now and saved ~450.000 L2-misses or ~0.2 s.
I did some optimizations in list.c to avoid generating sparse lists: When its clear that the whole list gets filled the programmer/compiler shall insert a set P0, I0 # set size of list before setting the first element. The rules, which grow_type is chosen when are straightened and better documented (in list.c). So with these two refinements, I have new numbers for some stress tests: stress stress1 stress2 life CVS 1.00 1.44 721 SPMC 0.60 12.0 1.50 793 my current 0.33 8.8 1.24 800 perl 5.8.0 0.6 12.0 2.41 stress1 does 10 times the (10+20+20) allocations of 200.000 elements. I'll check it in soon. stress tests are in seconds, life test is generations/sec, -O3 compiled parrot, JIT runtime (-P isn't slower here), i386/linux. SPMC ... the patch with smaller (-8 bytes) PMC current ... additionally DOD flags moved to arena + above Have fun, leo Acknowledgments: valgrind and ccache are really great tools. Get them if you don't already have them.