Some additional remarks:

The "memset" in smallobject.c is not necessary on Linux. mmap() (which
obviously gets called for memalign - at least for this arena size)
does clear the memory.  We need some tests, from which size memory is
cleard for malloc and memalign.
I tossed the memset for now and saved ~450.000 L2-misses or ~0.2 s.

I did some optimizations in list.c to avoid generating sparse lists:
When its clear that the whole list gets filled the programmer/compiler
shall insert a
  set P0, I0 # set size of list
before setting the first element. The rules, which grow_type is chosen
when are straightened and better documented (in list.c).

So with these two refinements, I have new numbers for some stress tests:

               stress  stress1  stress2  life
CVS              1.00              1.44   721
SPMC             0.60     12.0     1.50   793
my current       0.33      8.8     1.24   800
perl 5.8.0       0.6      12.0     2.41

stress1 does 10 times the (10+20+20) allocations of 200.000 elements.
I'll check it in soon.

stress tests are in seconds, life test is generations/sec, -O3
compiled parrot, JIT runtime (-P isn't slower here), i386/linux.

SPMC ... the patch with smaller (-8 bytes) PMC
current ... additionally DOD flags moved to arena + above

Have fun,
leo

Acknowledgments: valgrind and ccache are really great tools. Get them
if you don't already have them.

Reply via email to