On 12/9/19 2:03 PM, Jan Hubicka wrote:
On 12/9/19 1:14 PM, Martin Liška wrote:
Hello.

Based on presentation that had Sriraman Tallam at a LLVM conference:
https://www.youtube.com/watch?v=DySuXFGmB40

I made a heatmap based on executed instruction addresses. I used
$ perf record -F max -- ./cc1plus -fpreprocessed 
/home/marxin/Programming/tramp3d/tramp3d-v4.ii
and
$ perf script -F time,ip,dso

I'm sending link for my system GCC 9 (PGO+lean LTO bootstrap), GCC 10 before 
and after my reorder
patch (also PGO+lean LTO bootstrap).

One can see quite significant clustering starting from 5s till the end of 
compilation.
Link: https://drive.google.com/open?id=1M0YlxvQPyiVguy5VWRC8dG52UArwAuKS

Martin

For the completeness, the heatmap was generated with the following script:
https://github.com/marxin/script-misc/blob/master/binary-heatmap.py

Thanks,
this looks really useful as we had almost no way to check code layout
ever since you systemtap script stopped working.

Great, thanks.


On the first glance the difference between gcc9 and gcc10 is explained
by the changes to profile updating. gcc9 makes very small cold
partitions compared to gcc10.  It is very nice that we have a way to
measure it. I will also check if some of the more important profiling
update fixes makes sense to backport to gcc9.

Over weekend I did some fixes to tp reordreing, so it may be nice to
update your tests, but I will try to run it myself.

In general one can see individual stages of compilation on the graph -
parsing, early lowering, early opts.  On bigger programs this should be
more visible.  I will give it a try.

You haven't replied to question whether we want to let ipa-reorder into
trunk based on the sent images for GCC 10 PGO+LTO boostrap?

Martin


Honza


Reply via email to