Re: Using known data structure hierarchy in GC and PCH?

Diego Novillo Tue, 11 Dec 2012 08:11:56 -0800

On Mon, Dec 10, 2012 at 1:25 PM, Steven Bosscher <stevenb....@gmail.com> wrote:
> Hello,
>
> While trying to bootstrap with GCAC checking enabled and some
> instrumentation to measure how often objects are being marked, I
> noticed that a lot of cache misses happen because already-marked
> objects are being tested again (e.g. via ggc_marked_p). This struck me
> as rather odd, until I looked at what the GC markers for my favorite
> data structures (the CFG) look like.
>
> Basic blocks live in GC space because they can point to other GC-space
> objects (rtx, gimple), and these objects can point back to the basic
> block they're contained in. In addition, the (function-specific)
> basic_block_info array points to all live basic blocks and of course
> the edges have basic_block pointers as well. Finally, loop have
> pointers to the loop header and latch basic_blocks.
>
> The effect is that, on average, ggc_marked_p or one of its variants
> gets called 17 times on each basic_block, *per ggc_collect call*!
>
> This turns out to be typical for many data structures. It's more
> difficult to get accurate numbers, but GCAC is especially slow when
> building a large PCH and I'm guessing that this is in part due to
> tree-to-tree pointers.
>
> This all made me wonder why we can't use the known hierarchy of the
> intermediate representations. Ideally, there should be only two
> classes of objects: global state variables (not ideal, but a reality)
> and objects reachable from the symbol table. AFAICT anything that
> isn't in one of those two categories has probably become garbage
> somewhere along the way.


By "hierarchy" you don't mean actual type hierarchy, right?  You seem
to be looking for a way of having the GC objects in a non-cyclic graph
so that we only walk each reachable object exactly once?


> It seems to be that this hierarchy could also be useful if/when GCC
> will move away from generated walkers (i.e. gengtype). If we can
> somehow enforce a hierarchy, the markers become much simpler. Also,
> the hierarchy could be used to verify "no leaks" from alloc pools, and
> probably for other things I can't think of right now.

Yes, this may be doable with the manual marking strategy, but it
requires some knowledge of the kind "my data structure has a pointer
to a basic block, and basic blocks are higher in the GC graph, so I
don't need to walk them".

> Another "problem" with gengtype is that it doesn't know what types can
> end up in a PCH. The CFG data structures can *never* be in a PCH, but
> there still are PCH writer functions. This is true for many other data
> structures as well. Has anyone ever measured/investigated what PCH
> writer functions are actually used? Should we (as an intermediate step
> towards streaming PCHs) explicitly GTY-mark types that should have PCH
> writer functions, and not generate such functions for unmarked types?

Yes.  With manual markers, we simply don't provide PCH markers.
Compile/link errors will point to places where we have a type escaping
into PCH.


Diego.

Re: Using known data structure hierarchy in GC and PCH?

Reply via email to