https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46091
--- Comment #9 from Avi Kivity <a...@cloudius-systems.com> --- I believe the comment is wrong. Here's what the manual says: "This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically." Implying that without the LOCK prefix, it is not atomic. XCHG is the only instruction that asserts LOCK implicitly. Agner lists BTC reciprocal throughput as 1 for imm, mem case and 5 for reg, mem. The latter is slow, but perhaps still worthwhile as a replacement for the code in the first comment (but probably not when addressing a single word). Note there is also the BT instruction (with reciprocal throughput of 0.5!)