Re: [RfC] vtable->dump

Leopold Toetsch Wed, 03 Sep 2003 01:10:11 -0700

Dan Sugalski <[EMAIL PROTECTED]> wrote:
> At 8:38 PM +0200 9/2/03, Leopold Toetsch wrote:


[ next_for_GC ]
> That's easy, then, as it is. :) Besides, from looking at pobj.h, it's
> already there, so I'm not sure what we'd need to do. Perhaps I need
> to resync.

The PMC_EXT structure is only allocated for PMCs that access PMC_data().
Adding properties adds this structure on the fly.

Plain PerlScalars don't have the PMC_EXT structure and therefore no
next_for_GC pointer. As these scalars can't refer to other PMCs, there
is no need to visist these twice in the DOD run. They are marked by
setting their life flag directly. Done.

> ... We don't care
> about the live flag for this, what we care is that the PMC is put on
> the traversal list.

You are right, live is live.

>>When you have a PerlArray containing 100.000 plain PerlInts, you have
>>your linked list with 100.001 entries or one entry. You have polluted
>>caches or not - that's it (I'm speaking of DOD here).

> So you shortcut in the normal course of DOD?

Sure, as decribed above.

> ... I suppose--I presume
> you've benchmarked to see if testing all the flags of the contained
> PMCs when marking the container is faster than just throwing them on
> the list and checking them when you get to them?

Yes I did benchmark this. The main problem where data cache misses and
PMC size.  Putting the PMC on the next list pulls the whole structure
into the cache. Just setting the live_flag needs half a byte, which is
located in the PMCs arena.

> ... I can see that being
> the case in some cases, but I'm concerned that it'll consume more
> memory and take more time in some reasonably common cases.

For small sets of PMCs, this is equally fast then before. It trades some
extra cycles (calculate the arena for a PMC) against cache pollution. As
CPUs get faster and the more data you are processing this scheme wins.

And putting the next_for_GC back into the main PMC structure accounts
for +20% memory overhead for almost all PMCs (I expect the vast majority
to be plain scalars w/o properties).

>>And I still don't see, how you will handle duplicate lookup of
>>self-referential structures for freeze and thaw with a linked list of
>>items alone.

> The same way the DOD sweep handles it, unless you've changed things
> beyond recognition. The way it used to work is that when you marked a
> PMC as live, it got put on the end of the chain unless the PMC's next
> pointer field is non-NULL, in which case it's already been put on the
> list, and nothing happens.

I see ... It used to work that way in 0.0.6. I'm not sure if I changed
that, or it happened during the split of F<resources.c>. Anyway at that
time, we had mark_used() for marking PMCs and buffer_lives() for marking
Buffers.

These are united (my changes ;-), there is only one pobject_lives(),
which of course could put PMCs on the next-list again if we really want
that.

Ok then back to your proposed scheme for freeze (and with nested and
big structures in mind):

1) call mark() on the PMC, that is one walk through the aggregate
2) call freeze() on the PMC - i.e. a second walk through the
   same aggregate
3) if inside 2) we find a PMC: if it has next_for_GC set: duplicate
   else call mark() on it. This pulls in
   a potentially big aggregate and pollutes chaches.
4) get item from next list, goto 2)

5) reset next_for_GC for all PMCs

This means, we have 2 alternating runs through the same aggregates,
intersparsed with mark runs through other aggregates. Then we have
another run for all PMCs processed, resetting the next_for_GC pointer.

5) is for free on DOD runs (we walk the arenas anyway to destroy dead
objects). It's not free for freeze et al.

This all is for sure cache unfriendly.

Then: we want a solution for clone(), dump() (or pretty_print()),
freeze(). and thaw(). That means, we would need 3 more routines in
aggregates, that do all pretty much the same: walk through the
aggregate's items and call mark() for PMCs, then do something.
thaw() is different anyway.

One general traverse() vtable is just the simpler interface IMHO.

Running through each aggregate just once plus updating a seen_hash seems
simpler and faster to me and doesn't have any negative impact on PMC
size and DOD speed.

leo

Re: [RfC] vtable->dump

Reply via email to