https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85390
Bug ID: 85390 Summary: possible missed optimisation / regression from 6.3 with conditional expression Product: gcc Version: 8.0.1 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vegard.nossum at oracle dot com Target Milestone: --- Input: extern int a, b, c; int f(int x) { __builtin_prefetch((void *) (x ? a : b)); return c; } Current trunk with -O3 produces this: f(int): testl %edi, %edi je .L2 movslq a(%rip), %rax prefetcht0 (%rax) movl c(%rip), %eax ret .L2: movslq b(%rip), %rax prefetcht0 (%rax) movl c(%rip), %eax ret While 6.3.0 did not have a branch: f(int): movslq a(%rip), %rdx movslq b(%rip), %rax testl %edi, %edi cmovne %rdx, %rax prefetcht0 (%rax) movl c(%rip), %eax ret For reference, clang also outputs a branchless (but slightly longer) version: f(int): # @f(int) testl %edi, %edi movl $a, %eax movl $b, %ecx cmovneq %rax, %rcx movslq (%rcx), %rax prefetcht0 (%rax) movl c(%rip), %eax retq In my tests, the 6.3.0 code is equally fast in the x == 0 and x != 0 cases, whereas trunk/8.0.1 is only half as fast as 6.3.0 in the x == 0 (branch taken) case. In the branch not taken case, the 8.0.1 code has the same speed as the 6.3.0 code.