https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82259
Bug ID: 82259 Summary: missed optimization: use LEA to add 1 to flip the low bit when copying before AND with 1 Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- Target: x86_64-*-*, i?86-*-* bool bt_signed(int x, unsigned bit) { bit = 13; return !(x & (1<<bit)); } // https://godbolt.org/g/rzdtzm movl %edi, %eax sarl $13, %eax notl %eax andl $1, %eax ret This is pretty good, but we could do better by using addition instead of a separate NOT. (XOR is add-without-carry. Adding 1 will always flip the low bit). sarl $13, %edi lea 1(%edi), %eax andl $1, %eax ret If partial-registers aren't a problem, this will be even better on most CPUs: bt $13, %edi setz %al ret related: bug 47769 about missed BTR peepholes. That probably covers the missed BT. But *this* bug is about the LEA+AND vs. MOV+NOT+AND optimization. This might be relevant for other 2-operand ISAs with mostly destructive instructions, like ARM Thumb. Related: bool bt_unsigned(unsigned x, unsigned bit) { //bit = 13; return !(x & (1<<bit)); // 1U avoids test/set } movl %esi, %ecx movl $1, %eax sall %cl, %eax testl %edi, %eax sete %al ret This is weird. The code generated with 1U << bit is like the bt_signed code above and has identical results, so gcc should emit whatever is optimal for both cases. There are similar differences on ARM32. (With a fixed count, it just makes the difference between NOT vs. XOR $1.) If we're going to use setcc, it's definitely *much* better to use bt instead of a variable-count shift + test. bt %esi, %edi setz %al ret