Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification

Christophe Lyon via Gcc-patches Tue, 03 Aug 2021 02:36:23 -0700

Hi,

Since the arm-linux toolchain build has been fixed, I have noticed
additional failures on armeb:
    gcc.target/arm/crypto-vsha1cq_u32.c scan-assembler-times
vdup.32\\tq[0-9]+, r[0-9]+ 4
    gcc.target/arm/crypto-vsha1cq_u32.c scan-assembler-times
vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3
    gcc.target/arm/crypto-vsha1h_u32.c scan-assembler-times
vdup.32\\tq[0-9]+, r[0-9]+ 4
    gcc.target/arm/crypto-vsha1h_u32.c scan-assembler-times
vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3
    gcc.target/arm/crypto-vsha1mq_u32.c scan-assembler-times
vdup.32\\tq[0-9]+, r[0-9]+ 4
    gcc.target/arm/crypto-vsha1mq_u32.c scan-assembler-times
vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3
    gcc.target/arm/crypto-vsha1pq_u32.c scan-assembler-times
vdup.32\\tq[0-9]+, r[0-9]+ 4
    gcc.target/arm/crypto-vsha1pq_u32.c scan-assembler-times
vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3


I don't see them mentioned in this thread though?

Can you check?

Thanks

Christophe


On Thu, Jul 15, 2021 at 3:07 PM Jonathan Wright <jonathan.wri...@arm.com>
wrote:

> Ah, yes - those test results should have only been changed for little
> endian.
>
> I've submitted a patch to the list restoring the original expected results
> for big
> endian.
>
> Thanks,
> Jonathan
> ------------------------------
> *From:* Christophe Lyon <christophe.lyon....@gmail.com>
> *Sent:* 15 July 2021 10:09
> *To:* Richard Sandiford <richard.sandif...@arm.com>; Jonathan Wright <
> jonathan.wri...@arm.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>;
> Kyrylo Tkachov <kyrylo.tkac...@arm.com>
> *Subject:* Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification
>
>
>
> On Mon, Jul 12, 2021 at 5:31 PM Richard Sandiford via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>
> Jonathan Wright <jonathan.wri...@arm.com> writes:
> > Hi,
> >
> > Version 2 of this patch adds more code generation tests to show the
> > benefit of this RTL simplification as well as adding a new helper
> function
> > 'rtx_vec_series_p' to reduce code duplication.
> >
> > Patch tested as version 1 - ok for master?
>
> Sorry for the slow reply.
>
> > Regression tested and bootstrapped on aarch64-none-linux-gnu,
> > x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and
> > aarch64_be-none-linux-gnu - no issues.
>
> I've also tested this on powerpc64le-unknown-linux-gnu, no issues again.
>
> > diff --git a/gcc/combine.c b/gcc/combine.c
> > index
> 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4
> 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode
> op0_mode, int in_dest,
> >                             - 1,
> >                             0));
> >        break;
> > +    case VEC_SELECT:
> > +      {
> > +     rtx trueop0 = XEXP (x, 0);
> > +     mode = GET_MODE (trueop0);
> > +     rtx trueop1 = XEXP (x, 1);
> > +     int nunits;
> > +     /* If we select a low-part subreg, return that.  */
> > +     if (GET_MODE_NUNITS (mode).is_constant (&nunits)
> > +         && targetm.can_change_mode_class (mode, GET_MODE (x),
> ALL_REGS))
> > +       {
> > +         int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0)
> : 0;
> > +
> > +         if (rtx_vec_series_p (trueop1, offset))
> > +           {
> > +             rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode);
> > +             if (new_rtx != NULL_RTX)
> > +               return new_rtx;
> > +           }
> > +       }
> > +      }
>
> Since this occurs three times, I think it would be worth having
> a new predicate:
>
> /* Return true if, for all OP of mode OP_MODE:
>
>      (vec_select:RESULT_MODE OP SEL)
>
>    is equivalent to the lowpart RESULT_MODE of OP.  */
>
> bool
> vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx
> sel)
>
> containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class
> and rtx_vec_series_p tests.
>
> I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p
> is in emit-rtl.c.
>
> > diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md
> > index
> aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105
> 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -1884,15 +1884,16 @@
> >  )
> >
> >  (define_insn "*zero_extend<SHORT:mode><GPI:mode>2_aarch64"
> > -  [(set (match_operand:GPI 0 "register_operand" "=r,r,w")
> > -        (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand"
> "r,m,m")))]
> > +  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r")
> > +        (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand"
> "r,m,m,w")))]
> >    ""
> >    "@
> >     and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask>
> >     ldr<SHORT:size>\t%w0, %1
> > -   ldr\t%<SHORT:size>0, %1"
> > -  [(set_attr "type" "logic_imm,load_4,f_loads")
> > -   (set_attr "arch" "*,*,fp")]
> > +   ldr\t%<SHORT:size>0, %1
> > +   umov\t%w0, %1.<SHORT:size>[0]"
> > +  [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp")
> > +   (set_attr "arch" "*,*,fp,fp")]
>
> FTR (just to show I thought about it): I don't know whether the umov
> can really be considered an fp operation rather than a simd operation,
> but since we don't support fp without simd, this is already a distinction
> without a difference.  So the pattern is IMO OK as-is.
>
> > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> > index
> 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35
> 100644
> > --- a/gcc/config/arm/vfp.md
> > +++ b/gcc/config/arm/vfp.md
> > @@ -224,7 +224,7 @@
> >  ;; problems because small constants get converted into adds.
> >  (define_insn "*arm_movsi_vfp"
> >    [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m
> ,*t,r,*t,*t, *Uv")
> > -      (match_operand:SI 1 "general_operand"     "rk,
> I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))]
> > +      (match_operand:SI 1 "general_operand"     "rk,
> I,K,j,mi,rk,r,t,*t,*Uvi,*t"))]
> >    "TARGET_ARM && TARGET_HARD_FLOAT
> >     && (   s_register_operand (operands[0], SImode)
> >         || s_register_operand (operands[1], SImode))"
>
> I'll assume that an Arm maintainer would have spoken up by now if
> they didn't want this for some reason.
>
> > diff --git a/gcc/rtl.c b/gcc/rtl.c
> > index
> aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa
> 100644
> > --- a/gcc/rtl.c
> > +++ b/gcc/rtl.c
> > @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec vec)
> >      }
> >  }
> >
> > +/* Return true if element-selection indices in VEC are in series.  */
> > +
> > +bool
> > +rtx_vec_series_p (const_rtx vec, int start)
>
> I think rtvec_series_p would be better, for consistency with
> rtvec_all_equal_p.  Also, let's generalise it to:
>
> /* Return true if VEC contains a linear series of integers
>    { START, START+1, START+2, ... }.  */
>
> bool
> rtvec_series_p (rtvec vec, int start)
> {
> }
>
> > +{
> > +  for (int i = 0; i < XVECLEN (vec, 0); i++)
> > +    {
> > +      if (i + start != INTVAL (XVECEXP (vec, 0, i)))
> > +     return false;
> > +    }
> > +  return true;
>
> With the general definition I think this should be:
>
>   for (int i = 0; i < GET_NUM_ELEM (vec); i++)
>     {
>       rtx x = RTVEC_ELT (vec, i);
>       if (!CONST_INT_P (x) || INTVAL (x) != i + start)
>         return false;
>     }
>
> Then pass XVEC (sel, 0) to the function, instead of just sel.
>
> OK with those changes, thanks.
>
>
> Hi,
>
> Some of the updated tests fail on aarch64_be:
>      gcc.target/aarch64/sve/extract_1.c scan-assembler-times
> \\tfmov\\tw[0-9]+, s[0-9]\\n 2
>     gcc.target/aarch64/sve/extract_1.c scan-assembler-times
> \\tfmov\\tx[0-9]+, d[0-9]\\n 2
>     gcc.target/aarch64/sve/extract_2.c scan-assembler-times
> \\tfmov\\tw[0-9]+, s[0-9]\\n 2
>     gcc.target/aarch64/sve/extract_2.c scan-assembler-times
> \\tfmov\\tx[0-9]+, d[0-9]\\n 2
>     gcc.target/aarch64/sve/extract_3.c scan-assembler-times
> \\tfmov\\tw[0-9]+, s[0-9]\\n 5
>     gcc.target/aarch64/sve/extract_3.c scan-assembler-times
> \\tfmov\\tx[0-9]+, d[0-9]\\n 5
>     gcc.target/aarch64/sve/extract_4.c scan-assembler-times
> \\tfmov\\tw[0-9]+, s[0-9]\\n 6
>     gcc.target/aarch64/sve/extract_4.c scan-assembler-times
> \\tfmov\\tx[0-9]+, d[0-9]\\n 6
>
> Can you check?
>
> Thanks,
>
> Christophe
>
>
>
>
> Richard
>
>

Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification

Reply via email to