https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86693
--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to H.J. Lu from comment #5) > (In reply to Hongtao.liu from comment #4) > > Change testcase a little bit, gcc now can generate lock btc > > > > > > void func1(); > > > > void func(unsigned long *counter) > > { > > if (__atomic_fetch_xor(counter, 1, __ATOMIC_ACQ_REL) & 1) { > > func1(); > > } > > } > > > > > > func(unsigned long*): > > lock btc QWORD PTR [rdi], 0 > > jc .L4 > > ret > > .L4: > > jmp func1() > > We should rewrite the original test to the canonical form, similar to > r12-5102. > Hongtao, can you do that? The orginal testcase is not equal to btc, __atomic_fetch_xor(counter, 1, __ATOMIC_ACQ_REL) == 1 require other bits of *counter is 0. And as #c1 said, we don't have instructions to keep the old or new value of xor or ior in a register. I can't find a way to optimize off the exchangeloop.