> On Wed, Dec 11, 2013 at 1:21 AM, Martin Liška <marxin.li...@gmail.com> wrote:
> > Hello,
> >    I prepared a collection of systemtap graphs for GIMP.
> >
> > 1) just my profile-based function reordering: 550 pages
> > 2) just -freorder-blocks-and-partitions: 646 pages
> > 3) just -fno-reorder-blocks-and-partitions: 638 pages
> >
> > Please see attached data.
> 
> Thanks for the data. A few observations/questions:
> 
> With both 1) (your (time-based?) reordering) and 2)
> (-freorder-blocks-and-partitions) there are a fair amount of accesses
> out of the cold section. I'm not seeing so many accesses out of the

Good point, I misread the description and assumed that 1) is time profiling +
reorder-blcoks-and-partition.

Martin, what version of GCC you used?  Rong introduced bug into libgcov
that made gcov streaming around fork to split summaries (so the number
of runs did not match).  I fixed it by
2013-11-18  Jan Hubicka  <j...@suse.cz>

        * libgcov-driver.c (run_accounted): Make global level static.
        (gcov_exit_merge_summary): Silence warning; do not clear
        run_accounted here.
        (gcov_exit): Clear it here.

        * libgcov-driver.c (gcov_exit_merge_summary): Fix setting
        run_accounted.

        * libgcov-driver.c (get_gcov_dump_complete): Update comments.
        (all_prg, crc32): Remove static vars.
        (gcov_exit_compute_summary): Rewrite to return crc32; do not clear
        all_prg.
        (gcov_exit_merge_gcda): Add crc32 parameter.
        (gcov_exit_merge_summary): Add crc32 and all_prg parameter;
        do not account run if it was already accounted.
        (gcov_exit_dump_gcov): Add crc32 and all_prg parameters.
        (gcov_exit): Initialize all_prg; update.

so please be sure you have this one in tree.  If you do, can you please repeat
the trick with locked unlikely section so we see why we get there even 
with -fno-reorder-blcoks-and-partition?

> cold section in the apps I am looking at with splitting enabled. In
> the case of splitting, it could either be non-representative profile
> data or profile data that isn't being maintained properly and lost,
> although I think I fixed most of those. If you have identified any of
> the cold split routines that are being executed in the case of 2) it
> would be interesting to look at the dumps.
> 
> With 2) there is also a big clump towards the end which is being
> executed out of the cold section, which again would be interesting to
> investigate.

I think the data towards the end comes from fact that Martin is manually
quiting gimp and at that time he may do it differently (and after different
delay) each time.  This makes the graphs harder to read, but one should
basically ignore everything after the huge gap that indicate that the app
has started.
> 
> Why is your function reordering in 1) accessing more out of the cold
> section than 3) (-fno-reorder-blocks-and-partitions)?

This seems like a scale differnce here (i.e. Martin took longer to quit
gimp in gimp_reorder_blocks_and_partition).  In the first gap you can see
several red dots aligned vertically.  I did not go into manually counting the
dots, but it seems that they should be about the same.
> 
> Both 2) and 3) have both normal .text and .text.hot, whereas 1) only
> has .text. I wonder if that is contributing to the higher number of
> pages either of these has compared to 1), since the non-cold addresses
> are distributed across both sections?

Looking at Martin's tree
https://github.com/marxin/gcc/blob/time-profiler-patch2/gcc/varasm.c#L536
he has a hack in disables startup/hot sections, but keeps unlikely.
I agree that for more comparable results we should keep all.

But first lets work out why we have unlikely section accesses in again...
Martin, is there any chance you can test these things on mainline rather
than patched trees?

Honza

Reply via email to