As Nick seems to be hitting some limits, I've tried this: stuff a differing amount of PMCs into an Array and time a DOD run:
$ parrot -j 10m.pasm create 100000 PerlInts 0.064826 DOD sweeps: 4 one is 0.013948
$ parrot -j 10m.pasm create 500000 PerlInts 0.698430 DOD sweeps: 12 one is 0.070052
$ parrot -j 10m.pasm create 1e+06 PerlInts 2.051168 DOD sweeps: 21 one is 0.141291
$ parrot -j 10m.pasm create 2e+06 PerlInts 7.137497 DOD sweeps: 40 one is 0.278473
$ parrot -j 10m.pasm create 3e+06 PerlInts 15.232575 DOD sweeps: 59 one is 0.416493
$ parrot -j 10m.pasm create 4e+06 PerlInts 26.496387 DOD sweeps: 78 one is 0.555497
Athlon 800, optimized build. These 78 DOD runs all taking half a second are of course totally in vain. There is nothing to be recycled, all is alive in the array. So around approaching one million live PMCs things get starting look really ugly.
That's with Scalars, that don't have a next_for_GC pointer. They are marked directly in the fast path.
Now do the same and use a PMC *with* a next_for_GC field (note - the created Array is really almost empty just one Buffer_header.
$ parrot -j 10m.pasm create 1e+06 Arrays 33.949516 DOD sweeps: 79 one is 0.907532
DOD time is a factor of 6.5 worse. This is the additional time to walk the next_for_GC chain mainly.
$ cat 10m.pasm # well I was optimistic :( set S0, 1.0E6 set I0, S0 time N0 new P0, .PerlArray # set P0, I0 presize array or not set I1, 0 lp: new P1, .PerlInt # .Array set P0[I1], P1 inc I1 lt I1, I0, lp time N1 sub N1, N0 print "create " print S0 print " PerlInts\t" print N1 print "\n" time N0 sweep 1 time N1 sub N1, N0 print "DOD sweeps: " interpinfo I10, 2 print I10 print " one is\t" print N1 print "\n" end