Hi, On 2021/1/27 03:00, David Edelsohn wrote: > On Tue, Jan 26, 2021 at 2:46 AM Xionghu Luo <luo...@linux.ibm.com> wrote: >> >> From: "luo...@cn.ibm.com" <luo...@cn.ibm.com> >> >> UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT >> is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for >> variable vector insert. Remove rs6000_expand_vector_set_var helper >> function, adjust the p8 and p9 definitions position and make them >> static. >> >> The previous commit r11-6858 missed check m32, This patch is tested pass >> on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE with >> RUNTESTFLAGS="--target_board =unix'{-m32,-m64}" for BE targets. > > Hi, Xionghu > > Thanks for addressing these failures and the cleanups. > > This patch addresses most of the failures. > > pr79251-run.c continues to fail. The directives are not complete. > I'm not certain if your intention is to run the testcase on all > targets or only on Power7 and above. The testcase relies on vector > "long long", which only is available with -mvsx, but the testcase only > enables -maltivec. I believe that the testcase happens to pass on the > Linux platforms you tested because GCC defaulted to Power7 or Power8 > ISA and the ABI specifies VSX. The testcase probably needs to be > restricted to only run on some level of VSX enabled processor (VSX? > Power8? Power9?) and also needs some additional compiler options when > compiling the testcase instead of relying upon the default > configuration of the compiler.
P8BE: gcc/testsuite/gcc/gcc.sum(it didn't run before due to no 'dg-do run'): Running target unix/-m32 Running /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ... PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors) PASS: gcc.target/powerpc/pr79251-run.c execution test === gcc Summary for unix/-m32 === # of expected passes 2 Running target unix/-m64 Running /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ... PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors) PASS: gcc.target/powerpc/pr79251-run.c execution test === gcc Summary for unix/-m64 === # of expected passes 2 How did you get the failure of pr79251-run.c, please? I tested it all passes on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE of Linux. This case is just verifying the *functionality* of "u = vec_insert (254, v, k)" and compare whether u[k] is changed to 254, it must work on all platforms, no matter with the optimization or not, otherwise there is a functional error. As to "long long", add target vsx_hw and powerpc like below? (Also change the -maltive to -mvsx for pr79251.p8.c/pr79251.p9.c.) --- a/gcc/testsuite/gcc.target/powerpc/pr79251-run.c +++ b/gcc/testsuite/gcc.target/powerpc/pr79251-run.c @@ -1,4 +1,6 @@ -/* { dg-options "-O2 -maltivec" } */ +/* { dg-do run { target powerpc*-*-* } } */ +/* { dg-require-effective-target vsx_hw { target powerpc*-*-* } } */ +/* { dg-options "-O2 -mvsx" } */ Any other options necessary to limit the testcases? :) > > Also, part of the change seems to be > >> - if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8) >> - rs6000_expand_vector_set_var_p9 (target, val, idx); >> + if ((TARGET_P9_VECTOR && TARGET_POWERPC64) || width == 8) >> + { >> + rs6000_expand_vector_set_var_p9 (target, val, elt_rtx); >> + return; >> + } > > Does the P9 case need TARGET_POWERPC64? This optimization seemed to > be functioning on P9 in 32 bit mode prior to this fix. It would be a > shame to unnecessarily disable this optimization in 32 bit mode. Or > maybe it generated a functioning sequence but didn't utilize the > optimization. Would you please check / clarify? >> - if (TARGET_P8_VECTOR) >> + if (TARGET_P8_VECTOR && TARGET_DIRECT_MOVE_64BIT) >> { >> stmt = build_array_ref (loc, stmt, arg2); >> stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt, This change in rs6000-c.c causes it not generating VIEW_CONVERT_EXPR(ARRAY_REF) gimple code again for P9-32bit, then the IFN VEC_SET won't be matched, so rs6000.c:rs6000_expand_vector_set_var_p9 won't be called to produce optimized "lvsl+xxperm+lvsr" for P9-32bit again. It's a pity, but without this, it ICEs on P8BE-32bit because of UNSPEC_SI_FROM_SF is not supported for -m32, if we need to support P9-32bit, why not also support P8-32bit as only float vec_insert ICE, is there any method could move SI from SF for P8-32bit? (I verified the m32 optimized binary and non-optimized binary for int vec_insert on P8-BE-32bit, the performance gain is also huge as 8x improvement with this patch.) rs6000_expand_vector_set_var_p8 (rtx target, rtx val, rtx idx) { ... /* mtvsrd[wz] f0,tmp_val. */ rtx tmp_val = gen_reg_rtx (SImode); if (inner_mode == E_SFmode) emit_insn (gen_movsi_from_sf (tmp_val, val)); else tmp_val = force_reg (SImode, val); ... } Thanks Xionghu