On Tue, Jan 26, 2021 at 10:56 PM Xionghu Luo <luo...@linux.ibm.com> wrote: > > Hi, > > On 2021/1/27 03:00, David Edelsohn wrote: > > On Tue, Jan 26, 2021 at 2:46 AM Xionghu Luo <luo...@linux.ibm.com> wrote: > >> > >> From: "luo...@cn.ibm.com" <luo...@cn.ibm.com> > >> > >> UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT > >> is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for > >> variable vector insert. Remove rs6000_expand_vector_set_var helper > >> function, adjust the p8 and p9 definitions position and make them > >> static. > >> > >> The previous commit r11-6858 missed check m32, This patch is tested pass > >> on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE with > >> RUNTESTFLAGS="--target_board =unix'{-m32,-m64}" for BE targets. > > > > Hi, Xionghu > > > > Thanks for addressing these failures and the cleanups. > > > > This patch addresses most of the failures. > > > > pr79251-run.c continues to fail. The directives are not complete. > > I'm not certain if your intention is to run the testcase on all > > targets or only on Power7 and above. The testcase relies on vector > > "long long", which only is available with -mvsx, but the testcase only > > enables -maltivec. I believe that the testcase happens to pass on the > > Linux platforms you tested because GCC defaulted to Power7 or Power8 > > ISA and the ABI specifies VSX. The testcase probably needs to be > > restricted to only run on some level of VSX enabled processor (VSX? > > Power8? Power9?) and also needs some additional compiler options when > > compiling the testcase instead of relying upon the default > > configuration of the compiler. > > > P8BE: gcc/testsuite/gcc/gcc.sum(it didn't run before due to no 'dg-do run'): > > Running target unix/-m32 > Running > /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ... > PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors) > PASS: gcc.target/powerpc/pr79251-run.c execution test > === gcc Summary for unix/-m32 === > > # of expected passes 2 > Running target unix/-m64 > Running > /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ... > PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors) > PASS: gcc.target/powerpc/pr79251-run.c execution test > === gcc Summary for unix/-m64 === > > # of expected passes 2 > > > How did you get the failure of pr79251-run.c, please? I tested it all > passes on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE of Linux. This case is > just verifying the *functionality* of "u = vec_insert (254, v, k)" and > compare whether u[k] is changed to 254, it must work on all platforms, > no matter with the optimization or not, otherwise there is a functional > error. As to "long long", add target vsx_hw and powerpc like below?
AIX is not Linux. Linux ABI requires VSX, at least when it defaults to Power7 and above. AIX does not. I do not know if you have access to AIX systems and I don't want to overly-complicate the testing by adding AIX. The issue is that you testcase is assuming VSX, but nothing in the directives to the testcase ensured that VSX was enabled. > (Also change the -maltive to -mvsx for pr79251.p8.c/pr79251.p9.c.) > > --- a/gcc/testsuite/gcc.target/powerpc/pr79251-run.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr79251-run.c > @@ -1,4 +1,6 @@ > -/* { dg-options "-O2 -maltivec" } */ > +/* { dg-do run { target powerpc*-*-* } } */ Do not add target powerpc*-*-*, the testcase already is in the gcc.target/powerpc directory. > +/* { dg-require-effective-target vsx_hw { target powerpc*-*-* } } */ Similarly. And dg-require-effective-target should not have a "target" specifier. > +/* { dg-options "-O2 -mvsx" } */ I will test with this change. > > > Any other options necessary to limit the testcases? :) Again, it's a question of what, exactly, you are trying to test. This will test with the default code generation for the target. In other words, it may not test that the new P8 and P9 code generation paths run correctly. You would need separate "run p8" and "run p9" testcases that explicitly compile the code with -mdejagnu-cpu=power8 and -mdejagnu-cpu=power9. Your "compile" testcases check that the appropriate instructions appear in the assembly, but the "run" testcase doesn't verify that the p8 or p9 code runs correctly. Thanks, David > > > > > Also, part of the change seems to be > > > >> - if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8) > >> - rs6000_expand_vector_set_var_p9 (target, val, idx); > >> + if ((TARGET_P9_VECTOR && TARGET_POWERPC64) || width == 8) > >> + { > >> + rs6000_expand_vector_set_var_p9 (target, val, elt_rtx); > >> + return; > >> + } > > > > Does the P9 case need TARGET_POWERPC64? This optimization seemed to > > be functioning on P9 in 32 bit mode prior to this fix. It would be a > > shame to unnecessarily disable this optimization in 32 bit mode. Or > > maybe it generated a functioning sequence but didn't utilize the > > optimization. Would you please check / clarify? > > > >> - if (TARGET_P8_VECTOR) > >> + if (TARGET_P8_VECTOR && TARGET_DIRECT_MOVE_64BIT) > >> { > >> stmt = build_array_ref (loc, stmt, arg2); > >> stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt, > > > This change in rs6000-c.c causes it not generating > VIEW_CONVERT_EXPR(ARRAY_REF) > gimple code again for P9-32bit, then the IFN VEC_SET won't be matched, > so rs6000.c:rs6000_expand_vector_set_var_p9 won't be called to produce > optimized "lvsl+xxperm+lvsr" for P9-32bit again. It's a pity, but without > this, it ICEs on P8BE-32bit because of UNSPEC_SI_FROM_SF is not supported > for -m32, if we need to support P9-32bit, why not also support P8-32bit as > only float vec_insert ICE, is there any method could move SI from SF for > P8-32bit? > (I verified the m32 optimized binary and non-optimized binary for int > vec_insert > on P8-BE-32bit, the performance gain is also huge as 8x improvement with this > patch.) > > > rs6000_expand_vector_set_var_p8 (rtx target, rtx val, rtx idx) > { > ... > /* mtvsrd[wz] f0,tmp_val. */ > rtx tmp_val = gen_reg_rtx (SImode); > if (inner_mode == E_SFmode) > emit_insn (gen_movsi_from_sf (tmp_val, val)); > else > tmp_val = force_reg (SImode, val); > ... > } > > > Thanks > Xionghu > >