https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98648
--- Comment #4 from Gabriel Ravier <gabravier at gmail dot com> --- This code : typedef int64_t v2di __attribute__((vector_size(16))); v2di f(__m128 val) { return (~(v2di)_mm_set_ps1(0.0f) & (v2di)val); } is optimized better (and is equivalent, if I understand the semantics of andnps right). Maybe the builtin for andnot should be thrown out as soon as possible (i.e. transformed into ~a & b`) ? From what I can see, `~a & b` for vectors in general is optimized to an andnot operation too.