mzolotukhin added a comment.

> What kinds of <some part> would be useful to you? (How do you want the 
> runtime of Clang broken down?) Are vertical slices (through all Clang's 
> various layers and potentially parts of LLVM) -- as this patch will produce 
> -- useful, or would you really want horizontal slices (as in, what part of 
> Clang is actually spending the time)? Or just anything that's basically 
> expected to be consistent from run to run, so you can identify that something 
> has slowed down, even if you don't have enough information to really know 
> what?

For me "something has slowed down" would be enough. I.e. even if "parse 
templates" would be erroneously attributed to 90% time spent in front-end, I 
would be able to see a jump from 90% to 95%. While these numbers are not 
reflecting the actual ratio, they still will indicate changes.

I would find horizontal slicing more interesting, as we can get vertical slices 
from profilers. I.e. if in LLVM profile I see that time spent in APInt::add has 
grown a lot, I'd like to know which pass started to use it more extensively - 
it's rarely the case that APInt::add slowed down itself.

> I think we need to fix the overlap issue as a prerequisite to adding timers 
> with large amounts of overlap, especially self-overlap. Otherwise the numbers 
> will likely do more harm than good, as they will significantly misattribute 
> runtime. Fortunately, I think that should only require some relatively small 
> changes to the LLVM timer infrastructure.

I definitely would support that :)

> Well, that depends on the profiler. With some profilers, you can just attach 
> a profiler to your entire compilation and then ask it what the hottest 
> functions were. But sure, if you have existing infrastructure built around 
> -ftime-report, I can see that it makes sense to reuse that infrastructure.

Yeah, that was exactly my point.

> While LLVM may have some overlap between regions, the vertical timing slices 
> are still pretty well aligned with the horizontal functionality slices. I 
> expect the problems in Clang to be a lot worse. Suppose you enter N levels of 
> parsing templates, and then trigger a whole bunch of template instantiation, 
> AST deserialization, and code generation. Let's say that takes 1s in total. 
> With this patch, I think we'd end up attributing Ns of compile time to 
> "parsing templates", which is clearly very far from the truth, but will 
> likely be listed as (by far) the slowest thing in the compilation. This does 
> not even rise to the level of "not-perfect", it's going to be actively 
> misleading in a lot of cases, and won't even necessarily point anywhere near 
> the problematic spot.
>  I think with this patch and no fix to the overlap problem, we will find 
> ourselves frequently fielding bugs where people say "X is slow" when it 
> actually isn't. Plus I think we have the opportunity to deliver something 
> that's hugely useful and not much harder to achieve (providing timing 
> information that relates back to the source code), and I'd prefer we spend 
> our complexity budget for this feature there.

I see your point, and I agree it would be really misleading if e.g the sum of 
timers won't match the total time due to self-overlaps. I think adding these 
timers still might be worthwhile as my guess is that no one usually uses them 
anyway unless they know what to expect from the reported timers, but I might be 
mistaken here. And anyway, if you think it's possible to fix the issue first, 
it's totally fine with me.

Thanks,
Michael


https://reviews.llvm.org/D43578



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to