https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84935
--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 19 Mar 2018, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84935 > > --- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > Seems it actually is vectorized, probably just using DImode vectors for > 2xSImode, > and dom doesn't handle vector stores followed by scalar loads. Before > store-merging the dump is: > MEM[(int *)&a] = { 0, 1 }; > MEM[(int *)&a + 8B] = { 4, 9 }; > MEM[(int *)&a + 16B] = { 16, 25 }; > MEM[(int *)&a + 24B] = { 36, 49 }; > MEM[(int *)&a + 32B] = { 64, 81 }; > _6 = a[0]; > _28 = a[1]; > res_29 = _6 + _28; > _35 = a[2]; > res_36 = res_29 + _35; > _42 = a[3]; > res_43 = res_36 + _42; > _49 = a[4]; > res_50 = res_43 + _49; > _56 = a[5]; > res_57 = res_50 + _56; > _63 = a[6]; > res_64 = res_57 + _63; > _70 = a[7]; > res_71 = res_64 + _70; > _77 = a[8]; > res_78 = res_71 + _77; > _2 = a[9]; > res_11 = _2 + res_78; > a ={v} {CLOBBER}; > return res_11; > and nothing really changes till *.optimized in it. Similar as with nvptx. Yes, DOM doesn't handle this while FRE would.