On Mon, Apr 11, 2016 at 3:42 PM, Nathan Sidwell <nat...@acm.org> wrote: > This proto patch addresses 70594, where the constexpr machinery causes > differences in -fcompare-debug checking. I'm looking for initial comments > about the approach this patch is taking. > > As described in the comments of > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70594 the change is being > caused by differing GC points with and without debug info generation. We > generate different patterns of DECL_UID. > > With Patrick's patch to address 70542 constexpr_call_table is now > gc-deletable, and he introduced a cache for constexpr fn bodies, also > gc-deletable. > > This patch changes things so that both constexpr_call_table and > fundef_copies_table are no longer gc-deletable. Instead we simply count the > size of the constexpr_call_table and manually delete it if it gets 'too > big'. > > The bulk of this patch is adding gc machinery to the fundef_copies_table_t > type, as we now need to walk it during GC. > > I modified maybe_initialize_constexpr_call_table to delete the table if it > is too big, and we're at the outermost level of a constexpr call evaluation > stack. I picked a hard coded value for reasons I discuss below. It would > be nice to delete the table only if we discover we're going to do call > evaluation (rather than hit an already cached value), but that was a little > awkward. So I punted. > > I've added a new hook to delete these two tables at the end of parsing. > This is called from c_parse_final_cleanup. There's no point having them > hang around during the rest of compilation. > > In addition to bootstrapping and testing, I repeated Patrick's experiment of > building dwarf2out.c with -fcompare-debug. No changes were observed. > > Observations: > > 1) I think this approach is not globally best. We're inventing another > memory-usage related heuristic, rather than relying on the already available > GC machinery. That's why I've gone with a hard coded value for now, rather > than add a new user-frobable param. The right solution is to stop > generating new DECL_UIDs when copying fns for constexpr evaluation. IIUC > the UID is being used to map decls to values. I don't see why we can't just > use the (copied) DECL's address as a key to find its associated value (but > I've not investigated that approach). It's not like we particularly care > about the stability of that mapping between different compilations on > different machines -- it doesn't affect code generation. Reproducibility of > a class of compiler bugs would be reduced, but probably only marginally so,
This sounds like a good idea to me! > > 2) The constexpr_call_table and fundef_copies_table are closely related data > structures. I think it would be cleaner to combine these into a single data > structure. Possibly along with the call_stack and associated variables. > Such a cleanup would be good even if we don't go with this approach to the > defect, but could wait until 7.0. If we go with this approach, IMHO it > would be good to do some of this merging now. Hmm, my initial approach for PR c++/70452 actually tacked the freelist onto the constexpr_fundef_table, see https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00086.html. The implementation was only switched to a separate fundef_copies_table table only so that it could be deleted during GC which we now know is problematic. Maybe it's worth going back to this approach? (I don't readily see how the consetxpr_call_table and fundef_copies_table can be merged since the former is indexed by more than just the function decl.) > > 3) In experimenting I looked at the 70542 testcase. Allowing the caches to > grow unboundedly we end up with about 500K constexpr_call_table slots used, > 20 fundef_copies slots all taking up about 55MB (out of about 65MB) of > gc-able memory. > > In the 70594 testcase we use 127 constexpr_call_table slots, and about 1MB > of gcable memory. > > Picking a limit of 20000 constexpr_call_table slots causes several > reallocations and interestingly actually led to faster parsing for 70542 of > 6.8 seconds rather than 7.0 seconds. Presumably by keeping the reachable > working set smaller. > > 4) We could definitely do better with the when-to-delete heuristic. For > instance, not deleting if there's been no GC since the previous deletion > event. That would reduce the deletions I observed in the 70542 testcase, as > there were many more of them than GCs. > > comments? > > nathan