https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64897
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Since GCC 9 we get for fand1: movq %xmm0, %rax btrq $63, %rax movq %rax, %xmm0 ret The question comes does the movement between sse registers and gprs is cheaper than the load that and would cause.