https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84935
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Seems it actually is vectorized, probably just using DImode vectors for 2xSImode, and dom doesn't handle vector stores followed by scalar loads. Before store-merging the dump is: MEM[(int *)&a] = { 0, 1 }; MEM[(int *)&a + 8B] = { 4, 9 }; MEM[(int *)&a + 16B] = { 16, 25 }; MEM[(int *)&a + 24B] = { 36, 49 }; MEM[(int *)&a + 32B] = { 64, 81 }; _6 = a[0]; _28 = a[1]; res_29 = _6 + _28; _35 = a[2]; res_36 = res_29 + _35; _42 = a[3]; res_43 = res_36 + _42; _49 = a[4]; res_50 = res_43 + _49; _56 = a[5]; res_57 = res_50 + _56; _63 = a[6]; res_64 = res_57 + _63; _70 = a[7]; res_71 = res_64 + _70; _77 = a[8]; res_78 = res_71 + _77; _2 = a[9]; res_11 = _2 + res_78; a ={v} {CLOBBER}; return res_11; and nothing really changes till *.optimized in it.