On Tue, 19 Jun 2018, Cary Coutant wrote:

> > On testcases like that from PR60243 CFG build is dominated by
> > assign_discriminators because it expands locations again and again
> > and this got more expensive over the time.
> >
> > Cary - can you explain the overall logic of assign_discriminators,
> > specifically why if the last stmt of a block has UNKNOWN_LOCATION
> > we don't need any discriminator?  That last stmt inherits the last
> > location from a previous stmt with location.  Also why
> > do we look at e->dest _last_ stmt in addition to the first stmt?
> 
> Sorry, it's been a long time since I looked at this. I think that test
> for UNKNOWN_LOCATION is just a way of punting on an unexpected
> condition; the rest of the function won't work unless we have a valid
> locus to start with.
> 
> > If I understand correctly the assignment is done at CFG construction
> > time rather than location emission time because we want it done
> > on a representation close to source?  So if the last stmt has
> > the same line then the first should have as well?
> 
> As for why we look at last_stmt as well as first_stmt, that has to do
> with the way for loops are expanded. See my explanation here:
> 
>    https://gcc.gnu.org/ml/gcc-patches/2009-07/msg01450.html

Ah, OK.  I guess together with the fact that we're working on
an unoptimized CFG that makes sense.  It basically assumes
we have no line back-and-forth jumping at this stage which I'm
not 100% sure is a valid assumption.

Likewise we shouldn't have UNKNOWN_LOCATION stmts at this point
(but again I'm quite sure there are a few cases where we do...).

> > Or, to ask a different question - why is this done so early on
> > a non-optimized CFG rather than late on RTL right before we
> > emit .loc directives?
> 
> It has to be done early because we need to have discriminators
> assigned for the tree_profile pass, in order to use profile feedback
> from an earlier run.

OK.

> > It would be nice if you could expand the comment before
> > assign_discriminators in some way addressing the above.
> >
> > The patch below cuts the time spent in assign_discriminators
> > down by a factor of 2.5.  There's obvious cleanup possibilities
> > for the pointer hash-table given we should rather embed the
> > int, int pair rather than allocating it on the heap.  Couldn't
> > figure out the appropriate hash-traits quickly enough though.
> 
> This looks good, except won't the hash table now compare equal for two
> different locations where the line is the same, but the file is not?

This is what the previous state did as well, it just compared
(and hashed) LOCATION_LINE:

inline hashval_t
 locus_discrim_hasher::hash (const locus_discrim_map *item)
 {
-  return LOCATION_LINE (item->locus);
+  return item->location_line;
 }

locus_discrim_hasher::equal (const locus_discrim_map *a,
                             const locus_discrim_map *b)
 {
-  return LOCATION_LINE (a->locus) == LOCATION_LINE (b->locus);
+  return a->location_line == b->location_line;
 }

maybe that wasn't intended (or it was changed in previous
revs from sth more sensible).

> In the Google branches, we improved discriminator assignment quite a
> bit by tracking per instruction instead of per basic block. Here's my
> original patch to do that:
> 
>    https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00563.html

That somehow suggests assigning discriminators after optimizing
would help.  I don't see discriminator being used in the profile
code in the FSF tree btw., just some comments in auto-profile.c
that it could be used.

So I guess the issue is that you need to match the CFG at profile
reading time with the line-numbers associated to the final
assembly for sample-based profiling.  That indeed sounds like
a hard^Wimpossible task.  And indeed sth on a basic-block level
isn't going to help very much.

At least it suggests that we want to assign discriminators at
the same point we'll later read the sample-based profile rather
than at CFG construction.  There are quite a number of optimizations
run, including inlining, before pass_ipa_auto_profile.

Btw, does this mean that discriminators (and the compile-time used
to compute them) are not necessary if sample-based profiling isn't
used?

Thanks,
Richard.

Reply via email to