> Hello, > I did a collection of systemtap graphs for GIMP. > > All these graphs were created with enabled LTO, profiling and -O2. > > 1) gimp-reordered.pdf - function are reordered according to my newly > created profile that utilizes LTO infrastructure > 2) gimp-no-top-level-reorder.pdf - (GCC rev. 201648) -fno-top-level-reorder > 3) gimp-top-level-reorder.pdf - (GCC rev. 201648) -ftop-level-reorder
Thanks for the graphs! gimp-top-level-reorder seems to be bogus (it shows accesses into dynstr only). To catch the -fno-reorder-blocks-partition problem, perhaps you can modify the Martin's linker script to make .text.unlikely section non-executable. This way it will crash application every time we jump into it. Honza > > Honza has an idea how to minimize hot text section and I will send new > graphs for the proposed patch. > Moreover, I will send graphs for Inkscape which is written in C++. > > Have a nice day, > Martin > > On 11 August 2013 15:25, Teresa Johnson <tejohn...@google.com> wrote: > > Cc'ing Rong since he is also working on trying to address the comdat > > profile issue. Rong, you may need to see an earlier message for more > > context: > > http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00558.html > > > > Teresa > > > > On Sun, Aug 11, 2013 at 5:21 AM, Jan Hubicka <hubi...@ucw.cz> wrote: > >>> > >>> I see, yes LTO can deal with this better since it has global > >>> information. In non-LTO mode (including LIPO) we have the issue. > >> > >> Either Martin or me will implement merging of the multiple copies at > >> LTO link time. This is needed for Martin's code unification patch anyway. > >> > >> Theoretically gcov runtime can also have symbol names and cfg checksums of > >> comdats in the static data and at exit produce buckets based on matching > >> names+checksums+counter counts, merge all data into in each bucket to one > >> representative by the existing merging routines and then memcpy them to > >> all the oriignal copiles. This way all compilation units will receive same > >> results. > >> > >> I am not very keen about making gcov runtime bigger and more complex than > >> it > >> needs to be, but having sane profile for comdats seems quite important. > >> Perhaps, in GNU toolchain, ordered subsections can be used to make linker > >> to > >> produce ordered list of comdats, so the runtime won't need to do hashing + > >> lookups. > >> > >> Honza > >>> > >>> I take it gimp is built with LTO and therefore shouldn't be hitting > >>> this comdat issue? > >>> > >>> Let me do a couple things: > >>> - port over my comdat inlining fix from the google branch to trunk and > >>> send it for review. If you or Martin could try it to see if it helps > >>> with function splitting to avoid the hits from the cold code that > >>> would be great > >>> - I'll add some new sanity checking to try to detect non-zero blocks > >>> in the cold section, or 0 blocks reached by non-zero edges and see if > >>> I can flush out any problems with my tests or a profiledbootstrap or > >>> gimp. > >>> - I'll try building and profiling gimp myself to see if I can > >>> reproduce the issue with code executing out of the cold section. > >>> > >>> Thanks, > >>> Teresa > >>> > >>> >> > >>> >> Also, can you send me reproduction instructions for gimp? I don't > >>> >> think I need Martin's patch, but which version of gimp and what is the > >>> >> equivalent way for me to train it? I have some scripts to generate a > >>> >> similar type of instruction heat map graph that I have been using to > >>> >> tune partitioning and function reordering. Essentially it uses linux > >>> >> perf to sample on instructions_retired and then munge the data in > >>> >> several ways to produce various stats and graphs. One thing that has > >>> >> been useful has been to combine the perf data with nm output to > >>> >> determine which cold functions are being executed at runtime. > >>> > > >>> > Martin? > >>> > > >>> >> > >>> >> However, for this to tell me which split cold bbs are being executed I > >>> >> need to use a patch that Sri sent for review several months back that > >>> >> gives the split cold section its own name: > >>> >> http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01571.html > >>> >> Steven had some follow up comments that Sri hasn't had a chance to > >>> >> address yet: > >>> >> http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00798.html > >>> >> (cc'ing Sri as we should probably revive this patch soon to address > >>> >> gdb and other issues with detecting split functions properly) > >>> > > >>> > Intresting, I used linker script for this purposes, but that his GNU ld > >>> > only... > >>> > > >>> > Honza > >>> >> > >>> >> Thanks! > >>> >> Teresa > >>> >> > >>> >> > > >>> >> > Honza > >>> >> >> > >>> >> >> Thanks, > >>> >> >> Teresa > >>> >> >> > >>> >> >> > I think we are really looking primarily for dead parts of the > >>> >> >> > functions (sanity checks/error handling) > >>> >> >> > that should not be visited by train run. We can then see how to > >>> >> >> > make the heuristic more aggressive? > >>> >> >> > > >>> >> >> > Honza > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> -- > >>> >> >> Teresa Johnson | Software Engineer | tejohn...@google.com | > >>> >> >> 408-460-2413 > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> Teresa Johnson | Software Engineer | tejohn...@google.com | > >>> >> 408-460-2413 > >>> > >>> > >>> > >>> -- > >>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 > > > > > > > > -- > > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413