https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244
--- Comment #14 from dhowells at redhat dot com <dhowells at redhat dot com> --- Okay, I built and booted an x86_64 kernel that had the XXX_bit() and test_and_XXX_bit() ops altered to use __atomic_fetch_YYY() funcs. The core kernel ended up ~8K larger in the .text segment. Examining ext4_resize_begin() as an example, this statement: if (test_and_set_bit_lock(EXT4_RESIZING, &EXT4_SB(sb)->s_resize_flags)) ret = -EBUSY; looks like this in the unpatched kernel: 0xffffffff812169f3 <+122>: lock btsl $0x0,0x3b8(%rax) 0xffffffff812169fc <+131>: jb 0xffffffff81216a02 0xffffffff812169fe <+133>: xor %edx,%edx 0xffffffff81216a00 <+135>: jmp 0xffffffff81216a07 0xffffffff81216a02 <+137>: mov $0xfffffff0,%edx 0xffffffff81216a07 <+142>: mov %edx,%eax and like this in the patched kernel: 0xffffffff81217414 <+122>: xor %edx,%edx 0xffffffff81217416 <+124>: lock btsq $0x0,0x3b8(%rax) 0xffffffff81217420 <+134>: setb %dl 0xffffffff81217423 <+137>: neg %edx 0xffffffff81217425 <+139>: and $0xfffffff0,%edx 0xffffffff81217428 <+142>: mov %edx,%eax So it looks good here at least:-) This also suggests there's an error in the current x86_64 kernel implementation as the kernel bitops are supposed to operate on machine word-size locations, so it should be using BTSQ not BTSL - which would make the __atomic_fetch_or() variant a byte shorter - and involving no conditional jumps.