https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64897
Bug ID: 64897 Summary: Floating-point "and" not optimized on x86-64 Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: schnetter at gmail dot com I notice that gcc does not generate "vandpd" for floating-point "and" operations. Here is an example code that demonstrates this: {{{ #include <math.h> #include <string.h> double fand1(double x) { unsigned long ix; memcpy(&ix, &x, 8); ix &= 0x7fffffffffffffffUL; memcpy(&x, &ix, 8); return x; } double fand2(double x) { return fabs(x); } }}} When I compile this via: {{{ gcc-mp-4.9 -O3 -march=native -S fand.c -o fand-gcc-4.9.s }}} (OS X, x86-64 CPU, Intel Core i7), this results in: {{{ _fand1: movabsq $9223372036854775807, %rax vmovd %xmm0, %rdx andq %rdx, %rax vmovd %rax, %xmm0 ret _fand2: vmovsd LC1(%rip), %xmm1 vandpd %xmm1, %xmm0, %xmm0 ret }}} This shows that (a) gcc performs the bitwise and operation in an integer register, which is probably slower, and (b) the implementors of "fabs" assume that using the "vandpd" instruction is faster.