https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- Example that we don't transform but could: typedef int v4si __attribute__((vector_size(16))); #define min(a,b) ((a)<(b)?(a):(b)) v4si foo (v4si a, v4si b) { a[0] = min (a[0], b[0]); return a; } here the scalar code is movd %xmm0, %edx movd %xmm1, %eax cmpl %edx, %eax cmovg %edx, %eax pinsrd $0, %eax, %xmm0 where we could use sth like movq %xmm0, %xmm2 minpd %xmm2, %xmm1 <some pack/unpack/palign or whatever> a testcase variant could return the scalar minimum. For both cases it's likely a win even for -Os.