On 2017.05.25 at 11:55 +0200, Martin Liška wrote: > Hi. > > As I spoke about the PGO with Honza and Richi, current 3-stage is not ideal > for following > 2 reasons: > > 1) stageprofile compiler is train just on libraries that are built during > stage2 > 2) apart from that, as the compiler is also used to build the final compiler, > profile > is being updated during the build. So the stage2 compiler is making different > decisions. > > Both problems can be resolved by adding another step in between current > stage2 and stage3 > where we train stage2 compiler by building compiler with default options. > > I'm going to do some measurements.
I did some measurements on gcc67 (trunk with --enable-checking=release). The apparent speedup is in the noise. Without your patch: Performance counter stats for 'g++ -w -Ofast tramp3d-v4.cpp' (10 runs): 15749.058451 task-clock (msec) # 0.997 CPUs utilized ( +- 0.13% ) 1,352 context-switches # 0.086 K/sec ( +- 0.16% ) 7 cpu-migrations # 0.000 K/sec ( +- 5.73% ) 269,142 page-faults # 0.017 M/sec ( +- 0.01% ) 60,676,581,181 cycles # 3.853 GHz ( +- 0.09% ) (83.35%) 13,401,784,189 stalled-cycles-frontend # 22.09% frontend cycles idle ( +- 0.20% ) (83.33%) 12,926,843,370 stalled-cycles-backend # 21.30% backend cycles idle ( +- 0.04% ) (83.31%) 73,074,099,356 instructions # 1.20 insn per cycle # 0.18 stalled cycles per insn ( +- 0.02% ) (83.34%) 16,607,220,814 branches # 1054.490 M/sec ( +- 0.03% ) (83.36%) 616,673,310 branch-misses # 3.71% of all branches ( +- 0.08% ) (83.36%) 15.803602619 seconds time elapsed ( +- 0.14% ) With your patch: Performance counter stats for 'g++ -w -Ofast tramp3d-v4.cpp' (10 runs): 15735.220610 task-clock (msec) # 0.997 CPUs utilized ( +- 0.11% ) 1,354 context-switches # 0.086 K/sec ( +- 0.22% ) 6 cpu-migrations # 0.000 K/sec ( +- 6.67% ) 269,164 page-faults # 0.017 M/sec ( +- 0.01% ) 60,723,862,242 cycles # 3.859 GHz ( +- 0.08% ) (83.35%) 13,382,554,421 stalled-cycles-frontend # 22.04% frontend cycles idle ( +- 0.14% ) (83.31%) 12,912,171,664 stalled-cycles-backend # 21.26% backend cycles idle ( +- 0.03% ) (83.34%) 73,109,081,227 instructions # 1.20 insn per cycle # 0.18 stalled cycles per insn ( +- 0.03% ) (83.34%) 16,590,421,798 branches # 1054.349 M/sec ( +- 0.02% ) (83.35%) 616,669,135 branch-misses # 3.72% of all branches ( +- 0.08% ) (83.36%) 15.788772466 seconds time elapsed ( +- 0.12% ) -- Markus