http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48297
Summary: Suboptimal optimization of boolean expression addition
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: [email protected]
ReportedBy: [email protected]
gcc does not seem to recognize that boolean expressions, such as (a==b), are at
most 1. At least not when performing math on them.
Take the following function:
int foo( int a, int b, int c, int x )
{
return (a==x)+(b==x)+(c==x);
}
With -O3 -fomit-frame-pointer -march=core2 on x86_64, gcc compiles this to:
0: 39 cf cmp edi,ecx
2: 40 0f 94 c7 sete dil
6: 31 c0 xor eax,eax
8: 39 ce cmp esi,ecx
a: 40 0f b6 ff movzx edi,dil
e: 0f 94 c0 sete al
11: 01 f8 add eax,edi
13: 39 ca cmp edx,ecx
15: 0f 94 c2 sete dl
18: 0f b6 d2 movzx edx,dl
1b: 01 d0 add eax,edx
1d: c3 ret
As can be seen, gcc extends the inputs to 32-bit (with xor or movzx) and uses
32-bit additions, even though it could avoid this by using 8-bit additions and
a single movzx at the end.
Let's try to force it:
int bar( int a, int b, int c, int x )
{
return
(uint8_t)((uint8_t)((uint8_t)(a==x)+(uint8_t)(b==x))+(uint8_t)(c==x));
}
0000000000000020 <bar>:
20: 39 ce cmp esi,ecx
22: 40 0f 94 c6 sete sil
26: 39 cf cmp edi,ecx
28: 0f 94 c0 sete al
2b: 01 f0 add eax,esi
2d: 39 ca cmp edx,ecx
2f: 0f 94 c2 sete dl
32: 01 d0 add eax,edx
34: 0f b6 c0 movzx eax,al
37: c3 ret
Closer -- we saved two instructions -- but this is actually worse: the
additions will have false dependencies on the previous contents of those
registers. What we WANT is this:
cmp esi,ecx
sete sil
cmp edi,ecx
sete al
add al,sil
cmp edx,ecx
sete dl
add al,dl
movzx eax,al
ret
But gcc won't generate it no matter how much I attempt to massage it into doing
so.
Is this a missing optimization or an optimization bug?