On 2017.11.03 at 16:48 +0100, Jan Hubicka wrote: > this is updated patch which I have comitted after profiledbootstrapping x86-64
Unfortunately, compiling tramp3d-v4.cpp is 6-7% slower after this patch. This happens with an LTO/PGO bootstrapped gcc using --enable-checking=release. On X86_64: Before: Performance counter stats for 'g++ -w -Ofast tramp3d-v4.cpp' (4 runs): 25040.360183 task-clock (msec) # 1.000 CPUs utilized ( +- 0.25% ) 650 context-switches # 0.026 K/sec ( +- 76.87% ) 2 cpu-migrations # 0.000 K/sec ( +- 28.87% ) 268,141 page-faults # 0.011 M/sec ( +- 0.01% ) 80,210,085,167 cycles # 3.203 GHz ( +- 0.26% ) (66.67%) 21,061,765,388 stalled-cycles-frontend # 26.26% frontend cycles idle ( +- 0.37% ) (66.67%) 24,699,976,439 stalled-cycles-backend # 30.79% backend cycles idle ( +- 0.57% ) (66.68%) 69,167,169,243 instructions # 0.86 insn per cycle # 0.36 stalled cycles per insn ( +- 0.05% ) (66.68%) 15,230,229,662 branches # 608.227 M/sec ( +- 0.06% ) (66.68%) 986,612,296 branch-misses # 6.48% of all branches ( +- 0.07% ) (66.68%) 25.046439011 seconds time elapsed ( +- 0.25% ) After: Performance counter stats for 'g++ -w -Ofast tramp3d-v4.cpp' (4 runs): 26710.577065 task-clock (msec) # 1.000 CPUs utilized ( +- 0.27% ) 199 context-switches # 0.007 K/sec ( +- 21.12% ) 2 cpu-migrations # 0.000 K/sec ( +- 14.29% ) 267,676 page-faults # 0.010 M/sec ( +- 0.01% ) 85,561,962,974 cycles # 3.203 GHz ( +- 0.26% ) (66.66%) 19,581,827,643 stalled-cycles-frontend # 22.89% frontend cycles idle ( +- 0.30% ) (66.66%) 26,056,535,726 stalled-cycles-backend # 30.45% backend cycles idle ( +- 0.65% ) (66.68%) 77,222,167,966 instructions # 0.90 insn per cycle # 0.34 stalled cycles per insn ( +- 0.04% ) (66.68%) 17,471,652,187 branches # 654.110 M/sec ( +- 0.05% ) (66.69%) 1,082,141,013 branch-misses # 6.19% of all branches ( +- 0.04% ) (66.69%) 26.713823720 seconds time elapsed ( +- 0.27% ) ================================================================================================================== On PPC64le: Before: Performance counter stats for 'g++ -w -Ofast tramp3d-v4.cpp' (4 runs): 24281.894597 task-clock (msec) # 0.989 CPUs utilized ( +- 1.85% ) 166 context-switches # 0.007 K/sec ( +- 2.46% ) 5 cpu-migrations # 0.000 K/sec ( +- 18.03% ) 52,908 page-faults # 0.002 M/sec ( +- 11.61% ) 84,939,354,171 cycles # 3.498 GHz ( +- 1.82% ) (66.71%) 4,680,693,343 stalled-cycles-frontend # 5.51% frontend cycles idle ( +- 8.75% ) (49.98%) 46,697,372,688 stalled-cycles-backend # 54.98% backend cycles idle ( +- 2.06% ) (50.05%) 94,990,460,746 instructions # 1.12 insn per cycle # 0.49 stalled cycles per insn ( +- 0.10% ) (66.72%) 19,562,344,992 branches # 805.635 M/sec ( +- 0.07% ) (50.06%) 807,701,262 branch-misses # 4.13% of all branches ( +- 0.45% ) (50.05%) 24.550558669 seconds time elapsed ( +- 1.83% ) After: Performance counter stats for 'g++ -w -Ofast tramp3d-v4.cpp' (4 runs): 26383.472582 task-clock (msec) # 0.995 CPUs utilized ( +- 1.83% ) 202 context-switches # 0.008 K/sec ( +- 1.68% ) 5 cpu-migrations # 0.000 K/sec ( +- 14.29% ) 53,114 page-faults # 0.002 M/sec ( +- 17.86% ) 92,099,443,793 cycles # 3.491 GHz ( +- 0.96% ) (66.68%) 3,706,147,243 stalled-cycles-frontend # 4.02% frontend cycles idle ( +- 8.31% ) (50.00%) 51,376,299,749 stalled-cycles-backend # 55.78% backend cycles idle ( +- 0.83% ) (50.05%) 105,872,124,981 instructions # 1.15 insn per cycle # 0.49 stalled cycles per insn ( +- 0.05% ) (66.74%) 22,348,839,937 branches # 847.077 M/sec ( +- 0.16% ) (50.04%) 847,288,219 branch-misses # 3.79% of all branches ( +- 0.06% ) (50.02%) 26.511790685 seconds time elapsed ( +- 1.84% ) -- Markus