On Tue, 19 Jun 2018, Cary Coutant wrote: > > On testcases like that from PR60243 CFG build is dominated by > > assign_discriminators because it expands locations again and again > > and this got more expensive over the time. > > > > Cary - can you explain the overall logic of assign_discriminators, > > specifically why if the last stmt of a block has UNKNOWN_LOCATION > > we don't need any discriminator? That last stmt inherits the last > > location from a previous stmt with location. Also why > > do we look at e->dest _last_ stmt in addition to the first stmt? > > Sorry, it's been a long time since I looked at this. I think that test > for UNKNOWN_LOCATION is just a way of punting on an unexpected > condition; the rest of the function won't work unless we have a valid > locus to start with. > > > If I understand correctly the assignment is done at CFG construction > > time rather than location emission time because we want it done > > on a representation close to source? So if the last stmt has > > the same line then the first should have as well? > > As for why we look at last_stmt as well as first_stmt, that has to do > with the way for loops are expanded. See my explanation here: > > https://gcc.gnu.org/ml/gcc-patches/2009-07/msg01450.html
Ah, OK. I guess together with the fact that we're working on an unoptimized CFG that makes sense. It basically assumes we have no line back-and-forth jumping at this stage which I'm not 100% sure is a valid assumption. Likewise we shouldn't have UNKNOWN_LOCATION stmts at this point (but again I'm quite sure there are a few cases where we do...). > > Or, to ask a different question - why is this done so early on > > a non-optimized CFG rather than late on RTL right before we > > emit .loc directives? > > It has to be done early because we need to have discriminators > assigned for the tree_profile pass, in order to use profile feedback > from an earlier run. OK. > > It would be nice if you could expand the comment before > > assign_discriminators in some way addressing the above. > > > > The patch below cuts the time spent in assign_discriminators > > down by a factor of 2.5. There's obvious cleanup possibilities > > for the pointer hash-table given we should rather embed the > > int, int pair rather than allocating it on the heap. Couldn't > > figure out the appropriate hash-traits quickly enough though. > > This looks good, except won't the hash table now compare equal for two > different locations where the line is the same, but the file is not? This is what the previous state did as well, it just compared (and hashed) LOCATION_LINE: inline hashval_t locus_discrim_hasher::hash (const locus_discrim_map *item) { - return LOCATION_LINE (item->locus); + return item->location_line; } locus_discrim_hasher::equal (const locus_discrim_map *a, const locus_discrim_map *b) { - return LOCATION_LINE (a->locus) == LOCATION_LINE (b->locus); + return a->location_line == b->location_line; } maybe that wasn't intended (or it was changed in previous revs from sth more sensible). > In the Google branches, we improved discriminator assignment quite a > bit by tracking per instruction instead of per basic block. Here's my > original patch to do that: > > https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00563.html That somehow suggests assigning discriminators after optimizing would help. I don't see discriminator being used in the profile code in the FSF tree btw., just some comments in auto-profile.c that it could be used. So I guess the issue is that you need to match the CFG at profile reading time with the line-numbers associated to the final assembly for sample-based profiling. That indeed sounds like a hard^Wimpossible task. And indeed sth on a basic-block level isn't going to help very much. At least it suggests that we want to assign discriminators at the same point we'll later read the sample-based profile rather than at CFG construction. There are quite a number of optimizations run, including inlining, before pass_ipa_auto_profile. Btw, does this mean that discriminators (and the compile-time used to compute them) are not necessary if sample-based profiling isn't used? Thanks, Richard.