https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40168
--- Comment #22 from Andrew Pinski <pinskia at gcc dot gnu.org> --- With the trunk on the second testcase, SLP works but we get stuff like: _2 = (*b_420(D))[80]; _4 = (*b_420(D))[79]; _538 = {_2, _4}; _980 = _538 * vect__1.182_559; Shouldn't that just be loading a vector from (*b_420(D))[79] and then doing an VEC_PERM? In a reduced C testcase we get the correct thing: typedef double array[1000]; void f(array *a, array *b, array *c) { double t = (*a)[1] * (*b)[0]; double t1 = (*a)[0] * (*b)[0]; double t2 = (*a)[3] * (*b)[1]; double t3 = (*a)[2] * (*b)[1]; (*c)[0] = t; (*c)[1] = t1; (*c)[2] = t2; (*c)[3] = t3; }