https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081
--- Comment #1 from Jan Hubicka <hubicka at gcc dot gnu.org> --- This is an attempt to vectorize by hand, but it seems we do not generate vpmovsxwd for the vector short->double conversion struct pixels { short a __attribute__ ((vector_size(4*2))); } *pixels; struct dpixels { double a __attribute__ ((vector_size(8*4))); }; typedef double v4df __attribute__ ((vector_size (32))); struct dpixels test(double *k) { struct dpixels results={}; for (int u=0; u<10000;u++,k--) { results.a += *k*__builtin_convertvector (pixels[u].a, v4df); } return results; } clang seems to do right thing here.