On Tue, 20 Mar 2018, Rainer Orth wrote: > Hi Tom, > > > On 03/19/2018 10:11 AM, Richard Biener wrote: > >> On Fri, 16 Mar 2018, Tom de Vries wrote: > >> > >>> On 03/16/2018 12:55 PM, Richard Biener wrote: > >>>> On Fri, 16 Mar 2018, Tom de Vries wrote: > >>>> > >>>>> On 02/27/2018 01:42 PM, Richard Biener wrote: > >>>>>> Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c > >>>>>> =================================================================== > >>>>>> --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (nonexistent) > >>>>>> +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (working copy) > >>>>>> @@ -0,0 +1,15 @@ > >>>>>> +/* { dg-do compile } */ > >>>>>> +/* { dg-options "-O3 -fdump-tree-optimized" } */ > >>>>>> + > >>>>>> +int foo() > >>>>>> +{ > >>>>>> + int a[10]; > >>>>>> + for(int i = 0; i < 10; ++i) > >>>>>> + a[i] = i*i; > >>>>>> + int res = 0; > >>>>>> + for(int i = 0; i < 10; ++i) > >>>>>> + res += a[i]; > >>>>>> + return res; > >>>>>> +} > >>>>>> + > >>>>>> +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */ > >>>>> > >>>>> This fails for nvptx, because it doesn't have the required vector > >>>>> operations. > >>>>> To fix the fail, I've added requiring effective target vect_int_mult. > >>>> > >>>> On targets that do not vectorize you should see the scalar loops unrolled > >>>> instead. Or do you have only one loop vectorized? > >>> > >>> Sort of. Loop vectorization has no effect, and the scalar loops are > >>> completely > >>> unrolled. But then slp vectorization vectorizes the stores. > >>> > >>> So at optimized we have: > >>> ... > >>> MEM[(int *)&a] = { 0, 1 }; > >>> MEM[(int *)&a + 8B] = { 4, 9 }; > >>> MEM[(int *)&a + 16B] = { 16, 25 }; > >>> MEM[(int *)&a + 24B] = { 36, 49 }; > >>> MEM[(int *)&a + 32B] = { 64, 81 }; > >>> _6 = a[0]; > >>> _28 = a[1]; > >>> res_29 = _6 + _28; > >>> _35 = a[2]; > >>> res_36 = res_29 + _35; > >>> _42 = a[3]; > >>> res_43 = res_36 + _42; > >>> _49 = a[4]; > >>> res_50 = res_43 + _49; > >>> _56 = a[5]; > >>> res_57 = res_50 + _56; > >>> _63 = a[6]; > >>> res_64 = res_57 + _63; > >>> _70 = a[7]; > >>> res_71 = res_64 + _70; > >>> _77 = a[8]; > >>> res_78 = res_71 + _77; > >>> _2 = a[9]; > >>> res_11 = _2 + res_78; > >>> a ={v} {CLOBBER}; > >>> return res_11; > >>> ... > >>> > >>> The stores and loads are eliminated by dse1 in the rtl phase, and in the > >>> end > >>> we have: > >>> ... > >>> .visible .func (.param.u32 %value_out) foo > >>> { > >>> .reg.u32 %value; > >>> .local .align 16 .b8 %frame_ar[48]; > >>> .reg.u64 %frame; > >>> cvta.local.u64 %frame, %frame_ar; > >>> mov.u32 %value, 285; > >>> st.param.u32 [%value_out], %value; > >>> ret; > >>> } > >>> ... > >>> > >>>> That's precisely > >>>> what the PR was about... which means it isn't fixed for nvptx :/ > >>> > >>> Indeed the assembly is not optimal, and would be optimal if we'd have > >>> optimal > >>> code at optimized. > >>> > >>> FWIW, using this patch we generate optimal code at optimized: > >>> ... > >>> diff --git a/gcc/passes.def b/gcc/passes.def > >>> index 3ebcfc30349..6b64f600c4a 100644 > >>> --- a/gcc/passes.def > >>> +++ b/gcc/passes.def > >>> @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3. If not see > >>> NEXT_PASS (pass_tracer); > >>> NEXT_PASS (pass_thread_jumps); > >>> NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); > >>> + NEXT_PASS (pass_fre); > >>> NEXT_PASS (pass_strlen); > >>> NEXT_PASS (pass_thread_jumps); > >>> NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); > >>> ... > >>> > >>> and we get: > >>> ... > >>> .visible .func (.param.u32 %value_out) foo > >>> { > >>> .reg.u32 %value; > >>> mov.u32 %value, 285; > >>> st.param.u32 [%value_out], %value; > >>> ret; > >>> } > >>> ... > >>> > >>> I could file a missing optimization PR for nvptx, but I'm not sure where > >>> this > >>> should be fixed. > >> > >> Ah, yeah... the usual issue then. > >> > >> Can you please XFAIL the test on nvptx instead of requiring vect_int_mult? > >> > > > > Done. > > > > Committed at attached. > > this caused the test to FAIL on 64-bit (only) sparc-sun-solaris2.11: > > FAIL: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;" > > where it was UNSUPPORTED before.
So it failed before Toms original patch. Please add sparc-solaris to the list of XFAILed targets. > The dump has > > ;; Function foo (foo, funcdef_no=0, decl_uid=1557, cgraph_uid=0, > symbol_order=0) > > foo () > { > int res; > int a[10]; > int _2; > int _6; > int _28; > int _35; > int _42; > int _49; > int _56; > int _63; > int _70; > int _77; > > <bb 2> [local count: 97603132]: > MEM[(int *)&a] = { 0, 1 }; > MEM[(int *)&a + 8B] = { 4, 9 }; > MEM[(int *)&a + 16B] = { 16, 25 }; > MEM[(int *)&a + 24B] = { 36, 49 }; > MEM[(int *)&a + 32B] = { 64, 81 }; > _6 = a[0]; > _28 = a[1]; > res_29 = _6 + _28; > _35 = a[2]; > res_36 = res_29 + _35; > _42 = a[3]; > res_43 = res_36 + _42; > _49 = a[4]; > res_50 = res_43 + _49; > _56 = a[5]; > res_57 = res_50 + _56; > _63 = a[6]; > res_64 = res_57 + _63; > _70 = a[7]; > res_71 = res_64 + _70; > _77 = a[8]; > res_78 = res_71 + _77; > _2 = a[9]; > res_11 = _2 + res_78; > a ={v} {CLOBBER}; > return res_11; > > } > > Rainer > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)