https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87206
Bug ID: 87206 Summary: Suboptimal code generation for __atomic_compare_exchange_n followed by a comparison Product: gcc Version: 8.2.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com CC: krebbel at gcc dot gnu.org Target Milestone: --- I tried to build the example #5 from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080 on x86_64 and observed a similar issue: $ cat 1.c extern void bar (int *); void foo5(int *mem) { int oldval = 0; __atomic_compare_exchange_n (mem, (void *) &oldval, 1, 1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); if (oldval != 0) bar (mem); } $ gcc-8 -c 1.c -O3 -g $ objdump -d 1.o # skip 0000000000000000 <_foo5>: 0: 31 c0 xor %eax,%eax 2: ba 01 00 00 00 mov $0x1,%edx 7: f0 0f b1 17 lock cmpxchg %edx,(%rdi) b: 85 c0 test %eax,%eax d: 75 01 jne 10 <_foo5+0x10> f: c3 retq 10: e9 00 00 00 00 jmpq 15 <_foo5+0x15> We don't have to do "test %eax,%eax", because this information is already available through ZF, which is set by CMPXCHG. I wonder if it would be possible to come up with a common solution for all architectures, including x86_64 and s390?