This patch is okay with the removal of

{ target powerpc*-*-* }

from the pr79251-run.c testcase directives.

As I explained in the earlier email, I still believe that the testcase
is not testing what you intend, but this patch is a definite
improvement and removes the failures.  We can correct the testcase in
a follow-up patch.

Thanks for the clarification about P9 support.  32 bit doesn't have a
fast mechanism to move SImode to SFmode.

Thanks, David

On Tue, Jan 26, 2021 at 10:56 PM Xionghu Luo <luo...@linux.ibm.com> wrote:
>
> Hi,
>
> On 2021/1/27 03:00, David Edelsohn wrote:
> > On Tue, Jan 26, 2021 at 2:46 AM Xionghu Luo <luo...@linux.ibm.com> wrote:
> >>
> >> From: "luo...@cn.ibm.com" <luo...@cn.ibm.com>
> >>
> >> UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT
> >> is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for
> >> variable vector insert.  Remove rs6000_expand_vector_set_var helper
> >> function, adjust the p8 and p9 definitions position and make them
> >> static.
> >>
> >> The previous commit r11-6858 missed check m32, This patch is tested pass
> >> on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE with
> >> RUNTESTFLAGS="--target_board =unix'{-m32,-m64}" for BE targets.
> >
> > Hi, Xionghu
> >
> > Thanks for addressing these failures and the cleanups.
> >
> > This patch addresses most of the failures.
> >
> > pr79251-run.c continues to fail.  The directives are not complete.
> > I'm not certain if your intention is to run the testcase on all
> > targets or only on Power7 and above.  The testcase relies on vector
> > "long long", which only is available with -mvsx, but the testcase only
> > enables -maltivec.  I believe that the testcase happens to pass on the
> > Linux platforms you tested because GCC defaulted to Power7 or Power8
> > ISA and the ABI specifies VSX.  The testcase probably needs to be
> > restricted to only run on some level of VSX enabled processor (VSX?
> > Power8? Power9?) and also needs some additional compiler options when
> > compiling the testcase instead of relying upon the default
> > configuration of the compiler.
>
>
> P8BE: gcc/testsuite/gcc/gcc.sum(it didn't run before due to no 'dg-do run'):
>
> Running target unix/-m32
> Running 
> /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ...
> PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors)
> PASS: gcc.target/powerpc/pr79251-run.c execution test
>                 === gcc Summary for unix/-m32 ===
>
> # of expected passes            2
> Running target unix/-m64
> Running 
> /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ...
> PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors)
> PASS: gcc.target/powerpc/pr79251-run.c execution test
>                 === gcc Summary for unix/-m64 ===
>
> # of expected passes            2
>
>
> How did you get the failure of pr79251-run.c, please?  I tested it all
> passes on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE of Linux.  This case is
> just verifying the *functionality* of "u = vec_insert (254, v, k)" and
> compare whether u[k] is changed to 254, it must work on all platforms,
> no matter with the optimization or not, otherwise there is a functional
> error.  As to "long long", add target vsx_hw and powerpc like below?
> (Also change the -maltive to -mvsx for pr79251.p8.c/pr79251.p9.c.)
>
> --- a/gcc/testsuite/gcc.target/powerpc/pr79251-run.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr79251-run.c
> @@ -1,4 +1,6 @@
> -/* { dg-options "-O2 -maltivec" } */
> +/* { dg-do run { target powerpc*-*-* } } */
> +/* { dg-require-effective-target vsx_hw { target powerpc*-*-* } } */
> +/* { dg-options "-O2 -mvsx" } */
>
>
> Any other options necessary to limit the testcases? :)
>
> >
> > Also, part of the change seems to be
> >
> >> -  if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8)
> >> -    rs6000_expand_vector_set_var_p9 (target, val, idx);
> >> +         if ((TARGET_P9_VECTOR && TARGET_POWERPC64) || width == 8)
> >> +           {
> >> +             rs6000_expand_vector_set_var_p9 (target, val, elt_rtx);
> >> +             return;
> >> +           }
> >
> > Does the P9 case need TARGET_POWERPC64?  This optimization seemed to
> > be functioning on P9 in 32 bit mode prior to this fix.  It would be a
> > shame to unnecessarily disable this optimization in 32 bit mode.  Or
> > maybe it generated a functioning sequence but didn't utilize the
> > optimization.  Would you please check / clarify?
>
>
> >> -      if (TARGET_P8_VECTOR)
> >> +      if (TARGET_P8_VECTOR && TARGET_DIRECT_MOVE_64BIT)
> >>          {
> >>            stmt = build_array_ref (loc, stmt, arg2);
> >>            stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
>
>
> This change in rs6000-c.c causes it not generating 
> VIEW_CONVERT_EXPR(ARRAY_REF)
> gimple code again for P9-32bit, then the IFN VEC_SET won't be matched,
> so rs6000.c:rs6000_expand_vector_set_var_p9 won't be called to produce
> optimized "lvsl+xxperm+lvsr" for P9-32bit again.  It's a pity, but without
> this, it ICEs on P8BE-32bit because of UNSPEC_SI_FROM_SF is not supported
> for -m32, if we need to support P9-32bit, why not also support P8-32bit as
> only float vec_insert ICE, is there any method could move SI from SF for 
> P8-32bit?
> (I verified the m32 optimized binary and non-optimized binary for int 
> vec_insert
> on P8-BE-32bit, the performance gain is also huge as 8x improvement with this 
> patch.)
>
>
> rs6000_expand_vector_set_var_p8 (rtx target, rtx val, rtx idx)
> {
> ...
>   /*  mtvsrd[wz] f0,tmp_val.  */
>   rtx tmp_val = gen_reg_rtx (SImode);
>   if (inner_mode == E_SFmode)
>     emit_insn (gen_movsi_from_sf (tmp_val, val));
>   else
>     tmp_val = force_reg (SImode, val);
> ...
> }
>
>
> Thanks
> Xionghu
>
>

Reply via email to