https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87047
Bug ID: 87047
Summary: gcc 7 & 8 - performance regression because of
if-conversion
Product: gcc
Version: 7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: already5chosen at yahoo dot com
Target Milestone: ---
Created attachment 44570
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44570&action=edit
demonstrate performance regression because of if-conversion
Very significant performance regression from gcc 6.x to 7.x and 8.x cause by
if-conversion of predictable branch.
Compilation flags: -O2 -Wall
Target: x86-64 (my test machine is IvyBridge)
It is possible that the problem is specific to x86-64 target. I tested (by
observing compiler output) aarch64 target and it looks o.k.
The problem occurs here:
if ((i & 15)==0) {
const uint64_t PROD_ONE = (uint64_t)(1) << 19;
uint64_t prod = umulh(invRange, range);
invRange = umulh(invRange, (PROD_ONE*2-1-prod)<<44)<<1;
}
The condition has low probability and is easily predicted by branch predictor,
while code within if has relatively high latency.
gcc, starting from gcc.7.x and up to the latest, is convinced that always
executing the inner part of the if is a bright idea. Measurements, on my
real-world code, do not agree and show 30% slowdown. I'm sure that on
artificial sequences I can demonstrate a slowdown of 100% and more.
What is special about this case is that compiler is VERY confident in its
stupid decision. It does not change its mind even when I replace
if ((i & 15)==0) {
by
if (__builtin_expect((i & 15)==0, 0)) {
I found only two ways of forcing sane code generation:
1. -fno-if-conversion
2.
if ((i & 15)==0) {
asm volatile("");
...
}