https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84958
Bug ID: 84958 Summary: int loads not eliminated against larger stores Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- [ As discussed here: https://gcc.gnu.org/ml/gcc-patches/2018-03/msg00800.html ] This test-case: ... int foo() { int a[10]; for(int i = 0; i < 10; ++i) a[i] = i*i; int res = 0; for(int i = 0; i < 10; ++i) res += a[i]; return res; } ... compiled with -O3 results in this gimple at optimized: ... MEM[(int *)&a] = { 0, 1 }; MEM[(int *)&a + 8B] = { 4, 9 }; MEM[(int *)&a + 16B] = { 16, 25 }; MEM[(int *)&a + 24B] = { 36, 49 }; MEM[(int *)&a + 32B] = { 64, 81 }; _6 = a[0]; _28 = a[1]; res_29 = _6 + _28; _35 = a[2]; res_36 = res_29 + _35; _42 = a[3]; res_43 = res_36 + _42; _49 = a[4]; res_50 = res_43 + _49; _56 = a[5]; res_57 = res_50 + _56; _63 = a[6]; res_64 = res_57 + _63; _70 = a[7]; res_71 = res_64 + _70; _77 = a[8]; res_78 = res_71 + _77; _2 = a[9]; res_11 = _2 + res_78; a ={v} {CLOBBER}; return res_11; ... Loop vectorization has no effect, and the scalar loops are completely unrolled. Then slp vectorization vectorizes the stores. When disabling slp vectorization, we have instead: ... return 285; ... [ FWIW, adding an extra fre pass here also results in optimal gimple: ... diff --git a/gcc/passes.def b/gcc/passes.def index 3ebcfc30349..6b64f600c4a 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_tracer); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); + NEXT_PASS (pass_fre); NEXT_PASS (pass_strlen); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); ... ]