https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
Agner Fog <agner at agner dot org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |agner at agner dot org --- Comment #8 from Agner Fog <agner at agner dot org> --- The same problem applies to other kinds of optimizations, such as algebraic reductions and constant propagation. The method of using operators such as * and + is not portable to other compilers, and it doesn't work with integer vectors for other integer sizes than 64-bits. (I know that there is no integer FMA on Intel CPUs, but I am also talking about other optimizations). Here are some other examples of optimizations I would like gcc to do: #include "x86intrin.h" void dummy2(__m128 a, __m128 b); void dummyi2(__m128i a, __m128i b); void commutative(__m128 a, __m128 b) { // expect reduce a+b = b+a. This is the only reduction that actually works! dummy2(_mm_add_ps(a,b), _mm_add_ps(b,a)); } void associative(__m128i a, __m128i b, __m128i c) { // expect reduce (a+b)+c = a+(b+c) dummy2i(_mm_add_epi32(_mm_add_epi32(a,b),c), _mm_add_epi32(a,_mm_add_epi32(b,c))); } void distributive(__m128i a, __m128i b, __m128i c) { // expect reduce a*b+a*c = a*(b+c) dummy2i(_mm_add_epi32(_mm_mul_epi32(a,b),_mm_mul_epi32(a,c)), _mm_mul_epi32(a,_mm_add_epi32(b,c))); } void constant_propagation() { // expect store c and d as precalculated constants __m128i a = _mm_setr_epi32(1,2,3,4); __m128i b = _mm_set1_epi32(5); __m128i c = _mm_add_epi32(a,b); __m128i d = _mm_mul_epi32(a,b); dummyi2(c,d); } Of course, the same applies to 256-bit and 512-bit vectors.