On 2017.11.07 at 00:12 +0100, Jan Hubicka wrote: > > On 2017.11.05 at 11:55 +0100, Jan Hubicka wrote: > > > > On 2017.11.03 at 16:48 +0100, Jan Hubicka wrote: > > > > > this is updated patch which I have comitted after > > > > > profiledbootstrapping x86-64 > > > > > > > > Unfortunately, compiling tramp3d-v4.cpp is 6-7% slower after this patch. > > > > This happens with an LTO/PGO bootstrapped gcc using > > > > --enable-checking=release. > > > > > > our periodic testers has also picked up the change and there is no > > > compile time > > > regression reported for tramp3d. > > > https://gcc.opensuse.org/gcc-old/c++bench-czerny/tramp3d/ > > > so I would conclude that it is regression in LTO+PGO bootstrap. I am > > > fixing one checking > > > bug that may cause it (where we mix local and global profiles) so perhaps > > > it will go away > > > afterwards. > > > > Just to confirm: pure PGO bootstrap is fine, e.g. on Ryzen: > > (LTO/PGO) 17.65 sec ( +- 0.68% ) > > (PGO) 15.74 sec ( +- 0.27% ) > > Thanks. I have comitted the patch for inlining profile update bug, so with > some > luck LTO/PGO may be fine again.
It got worse, unfortunately: Pure PGO: Performance counter stats for '/home/trippels/gcc_8/usr/local/bin/g++ -w -Ofast tramp3d-v4.cpp' (4 runs): 16213.529306 task-clock (msec) # 0.999 CPUs utilized ( +- 0.25% ) 1,387 context-switches # 0.086 K/sec ( +- 0.17% ) 4 cpu-migrations # 0.000 K/sec ( +- 14.80% ) 261,764 page-faults # 0.016 M/sec ( +- 0.03% ) 62,633,457,222 cycles # 3.863 GHz ( +- 0.20% ) (83.32%) 13,990,050,204 stalled-cycles-frontend # 22.34% frontend cycles idle ( +- 0.51% ) (83.33%) 13,189,755,888 stalled-cycles-backend # 21.06% backend cycles idle ( +- 0.04% ) (83.31%) 75,194,592,630 instructions # 1.20 insn per cycle # 0.19 stalled cycles per insn ( +- 0.03% ) (83.35%) 17,113,639,942 branches # 1055.516 M/sec ( +- 0.02% ) (83.38%) 634,471,544 branch-misses # 3.71% of all branches ( +- 0.07% ) (83.34%) 16.226375499 seconds time elapsed ( +- 0.24% ) LTO/PGO: Performance counter stats for '/home/trippels/gcc_8/usr/local/bin/g++ -w -Ofast tramp3d-v4.cpp' (4 runs): 18622.496264 task-clock (msec) # 0.999 CPUs utilized ( +- 0.35% ) 1,592 context-switches # 0.086 K/sec ( +- 0.32% ) 4 cpu-migrations # 0.000 K/sec ( +- 14.43% ) 261,370 page-faults # 0.014 M/sec ( +- 0.12% ) 71,849,030,564 cycles # 3.858 GHz ( +- 0.08% ) (83.34%) 15,987,209,604 stalled-cycles-frontend # 22.25% frontend cycles idle ( +- 0.47% ) (83.32%) 14,336,345,458 stalled-cycles-backend # 19.95% backend cycles idle ( +- 0.05% ) (83.33%) 87,674,608,740 instructions # 1.22 insn per cycle # 0.18 stalled cycles per insn ( +- 0.01% ) (83.36%) 20,610,950,144 branches # 1106.777 M/sec ( +- 0.01% ) (83.35%) 638,454,497 branch-misses # 3.10% of all branches ( +- 0.08% ) (83.35%) 18.644370559 seconds time elapsed ( +- 0.38% ) -- Markus