https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80952
Martin Liška <marxin at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|missed-optimization |
Status|UNCONFIRMED |NEW
Last reconfirmed| |2017-06-02
Ever confirmed|0 |1
--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
Confirmed, it's caused by adding -fprofile-update option in GCC 7.1:
https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html
Where by default (when one uses -pthread), it's -fprofile-update=atomic.
The option guarantees that collected profile is not corrupted as updating is
racy.
Using -fprofile-generate and -fprofile-update=single causes:
pr80952.cpp: In function ‘main._omp_fn.0’:
pr80952.cpp:40:1: error: corrupted profile info: profile data is not
flow-consistent
}
^
pr80952.cpp:40:1: error: corrupted profile info: number of executions for edge
3-4 thought to be 32
pr80952.cpp:40:1: error: corrupted profile info: number of executions for edge
3-13 thought to be -2
pr80952.cpp:40:1: error: corrupted profile info: number of executions for edge
11-8 thought to be -239929
pr80952.cpp:40:1: error: corrupted profile info: number of executions for edge
11-12 thought to be 247808373
Running perf confirms that locking is bottleneck:
0.00 : 4014a3: lock addq $0x1,0x204024(%rip) # 6054d0
<__gcov0.main._omp_fn.0+0x30>
49.28 : 4014ac: cmp %ecx,%r8d
0.00 : 4014af: jl 4014c3 <main._omp_fn.0+0x93>
0.00 : 4014b1: lock addq $0x1,0x203ffe(%rip) # 6054b8
<__gcov0.main._omp_fn.0+0x18>
: {
: if (dividend % divisor == 0) {
50.12 : 4014ba: mov %esi,%eax
0.01 : 4014bc: cltd
Well, I planned to provide profile update method where there will be function
local counters that will be merged to global at function exit.
That would definitely help in this example.