On Mon, Dec 10, 2012 at 1:25 PM, Steven Bosscher <stevenb....@gmail.com> wrote: > Hello, > > While trying to bootstrap with GCAC checking enabled and some > instrumentation to measure how often objects are being marked, I > noticed that a lot of cache misses happen because already-marked > objects are being tested again (e.g. via ggc_marked_p). This struck me > as rather odd, until I looked at what the GC markers for my favorite > data structures (the CFG) look like. > > Basic blocks live in GC space because they can point to other GC-space > objects (rtx, gimple), and these objects can point back to the basic > block they're contained in. In addition, the (function-specific) > basic_block_info array points to all live basic blocks and of course > the edges have basic_block pointers as well. Finally, loop have > pointers to the loop header and latch basic_blocks. > > The effect is that, on average, ggc_marked_p or one of its variants > gets called 17 times on each basic_block, *per ggc_collect call*! > > This turns out to be typical for many data structures. It's more > difficult to get accurate numbers, but GCAC is especially slow when > building a large PCH and I'm guessing that this is in part due to > tree-to-tree pointers. > > This all made me wonder why we can't use the known hierarchy of the > intermediate representations. Ideally, there should be only two > classes of objects: global state variables (not ideal, but a reality) > and objects reachable from the symbol table. AFAICT anything that > isn't in one of those two categories has probably become garbage > somewhere along the way.
By "hierarchy" you don't mean actual type hierarchy, right? You seem to be looking for a way of having the GC objects in a non-cyclic graph so that we only walk each reachable object exactly once? > It seems to be that this hierarchy could also be useful if/when GCC > will move away from generated walkers (i.e. gengtype). If we can > somehow enforce a hierarchy, the markers become much simpler. Also, > the hierarchy could be used to verify "no leaks" from alloc pools, and > probably for other things I can't think of right now. Yes, this may be doable with the manual marking strategy, but it requires some knowledge of the kind "my data structure has a pointer to a basic block, and basic blocks are higher in the GC graph, so I don't need to walk them". > Another "problem" with gengtype is that it doesn't know what types can > end up in a PCH. The CFG data structures can *never* be in a PCH, but > there still are PCH writer functions. This is true for many other data > structures as well. Has anyone ever measured/investigated what PCH > writer functions are actually used? Should we (as an intermediate step > towards streaming PCHs) explicitly GTY-mark types that should have PCH > writer functions, and not generate such functions for unmarked types? Yes. With manual markers, we simply don't provide PCH markers. Compile/link errors will point to places where we have a type escaping into PCH. Diego.