This patch is okay with the removal of { target powerpc*-*-* }
from the pr79251-run.c testcase directives. As I explained in the earlier email, I still believe that the testcase is not testing what you intend, but this patch is a definite improvement and removes the failures. We can correct the testcase in a follow-up patch. Thanks for the clarification about P9 support. 32 bit doesn't have a fast mechanism to move SImode to SFmode. Thanks, David On Tue, Jan 26, 2021 at 10:56 PM Xionghu Luo <luo...@linux.ibm.com> wrote: > > Hi, > > On 2021/1/27 03:00, David Edelsohn wrote: > > On Tue, Jan 26, 2021 at 2:46 AM Xionghu Luo <luo...@linux.ibm.com> wrote: > >> > >> From: "luo...@cn.ibm.com" <luo...@cn.ibm.com> > >> > >> UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT > >> is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for > >> variable vector insert. Remove rs6000_expand_vector_set_var helper > >> function, adjust the p8 and p9 definitions position and make them > >> static. > >> > >> The previous commit r11-6858 missed check m32, This patch is tested pass > >> on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE with > >> RUNTESTFLAGS="--target_board =unix'{-m32,-m64}" for BE targets. > > > > Hi, Xionghu > > > > Thanks for addressing these failures and the cleanups. > > > > This patch addresses most of the failures. > > > > pr79251-run.c continues to fail. The directives are not complete. > > I'm not certain if your intention is to run the testcase on all > > targets or only on Power7 and above. The testcase relies on vector > > "long long", which only is available with -mvsx, but the testcase only > > enables -maltivec. I believe that the testcase happens to pass on the > > Linux platforms you tested because GCC defaulted to Power7 or Power8 > > ISA and the ABI specifies VSX. The testcase probably needs to be > > restricted to only run on some level of VSX enabled processor (VSX? > > Power8? Power9?) and also needs some additional compiler options when > > compiling the testcase instead of relying upon the default > > configuration of the compiler. > > > P8BE: gcc/testsuite/gcc/gcc.sum(it didn't run before due to no 'dg-do run'): > > Running target unix/-m32 > Running > /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ... > PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors) > PASS: gcc.target/powerpc/pr79251-run.c execution test > === gcc Summary for unix/-m32 === > > # of expected passes 2 > Running target unix/-m64 > Running > /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ... > PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors) > PASS: gcc.target/powerpc/pr79251-run.c execution test > === gcc Summary for unix/-m64 === > > # of expected passes 2 > > > How did you get the failure of pr79251-run.c, please? I tested it all > passes on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE of Linux. This case is > just verifying the *functionality* of "u = vec_insert (254, v, k)" and > compare whether u[k] is changed to 254, it must work on all platforms, > no matter with the optimization or not, otherwise there is a functional > error. As to "long long", add target vsx_hw and powerpc like below? > (Also change the -maltive to -mvsx for pr79251.p8.c/pr79251.p9.c.) > > --- a/gcc/testsuite/gcc.target/powerpc/pr79251-run.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr79251-run.c > @@ -1,4 +1,6 @@ > -/* { dg-options "-O2 -maltivec" } */ > +/* { dg-do run { target powerpc*-*-* } } */ > +/* { dg-require-effective-target vsx_hw { target powerpc*-*-* } } */ > +/* { dg-options "-O2 -mvsx" } */ > > > Any other options necessary to limit the testcases? :) > > > > > Also, part of the change seems to be > > > >> - if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8) > >> - rs6000_expand_vector_set_var_p9 (target, val, idx); > >> + if ((TARGET_P9_VECTOR && TARGET_POWERPC64) || width == 8) > >> + { > >> + rs6000_expand_vector_set_var_p9 (target, val, elt_rtx); > >> + return; > >> + } > > > > Does the P9 case need TARGET_POWERPC64? This optimization seemed to > > be functioning on P9 in 32 bit mode prior to this fix. It would be a > > shame to unnecessarily disable this optimization in 32 bit mode. Or > > maybe it generated a functioning sequence but didn't utilize the > > optimization. Would you please check / clarify? > > > >> - if (TARGET_P8_VECTOR) > >> + if (TARGET_P8_VECTOR && TARGET_DIRECT_MOVE_64BIT) > >> { > >> stmt = build_array_ref (loc, stmt, arg2); > >> stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt, > > > This change in rs6000-c.c causes it not generating > VIEW_CONVERT_EXPR(ARRAY_REF) > gimple code again for P9-32bit, then the IFN VEC_SET won't be matched, > so rs6000.c:rs6000_expand_vector_set_var_p9 won't be called to produce > optimized "lvsl+xxperm+lvsr" for P9-32bit again. It's a pity, but without > this, it ICEs on P8BE-32bit because of UNSPEC_SI_FROM_SF is not supported > for -m32, if we need to support P9-32bit, why not also support P8-32bit as > only float vec_insert ICE, is there any method could move SI from SF for > P8-32bit? > (I verified the m32 optimized binary and non-optimized binary for int > vec_insert > on P8-BE-32bit, the performance gain is also huge as 8x improvement with this > patch.) > > > rs6000_expand_vector_set_var_p8 (rtx target, rtx val, rtx idx) > { > ... > /* mtvsrd[wz] f0,tmp_val. */ > rtx tmp_val = gen_reg_rtx (SImode); > if (inner_mode == E_SFmode) > emit_insn (gen_movsi_from_sf (tmp_val, val)); > else > tmp_val = force_reg (SImode, val); > ... > } > > > Thanks > Xionghu > >