> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Thursday, July 4, 2024 12:46 PM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
> <richard.earns...@arm.com>; Marcus Shawcroft
> <marcus.shawcr...@arm.com>; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH 1/2]AArch64: make aarch64_simd_vec_unpack<su>_lo_/_hi_
> consistent.
> 
> Tamar Christina <tamar.christ...@arm.com> writes:
> > Hi All,
> >
> > The fix for PR18127 reworked the uxtl to zip optimization.
> > In doing so it undid the changes in aarch64_simd_vec_unpack<su>_lo_ and this
> now
> > no longer matches aarch64_simd_vec_unpack<su>_hi_.  It still works because
> the
> > RTL generated by aarch64_simd_vec_unpack<su>_lo_ overlaps with the general
> zero
> > extend RTL and so because that one is listed before the lo pattern recog 
> > picks
> > it instead.
> >
> > This just makes aarch64_simd_vec_unpack<su>_lo_ mirror
> > aarch64_simd_vec_unpack<su>_hi_ for consistency and so we're not relying on
> the
> > order of patterns.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >     * config/aarch64/aarch64-simd.md
> >     (aarch64_simd_vec_unpack<su>_lo_<mode>): Add same split as
> >     aarch64_simd_vec_unpack<su>_hi_.
> >     * config/aarch64/aarch64.cc (aarch64_gen_shareable_zero): Update
> >     comment.
> >
> > ---
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> > index
> 01b084d8ccb53fc0f1c7b0dd8f23546c331f020a..d4026cbf0b66995104e8e40ca
> 294faaf8d5ba847 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -1904,7 +1904,7 @@ (define_insn
> "*aarch64_<srn_op>topbits_shuffle<mode>_be"
> >
> >  ;; Widening operations.
> >
> > -(define_insn "aarch64_simd_vec_unpack<su>_lo_<mode>"
> > +(define_insn_and_split "aarch64_simd_vec_unpack<su>_lo_<mode>"
> >    [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
> >          (ANY_EXTEND:<VWIDE> (vec_select:<VHALF>
> >                            (match_operand:VQW 1 "register_operand" "w")
> > @@ -1912,6 +1912,19 @@ (define_insn
> "aarch64_simd_vec_unpack<su>_lo_<mode>"
> >                         )))]
> >    "TARGET_SIMD"
> >    "<su>xtl\t%0.<Vwtype>, %1.<Vhalftype>"
> > +  "&& <CODE> == ZERO_EXTEND
> > +   && aarch64_split_simd_shift_p (insn)"
> > +  [(const_int 0)]
> > +  {
> > +    /* On many cores, it is cheaper to implement UXTL using a ZIP1 with 
> > zero,
> > +       provided that the cost of the zero can be amortized over several
> > +       operations.  We'll later recombine the zero and zip if there are
> > +       not sufficient uses of the zero to make the split worthwhile.  */
> > +    rtx res = simplify_gen_subreg (<MODE>mode, operands[0], <VWIDE>mode,
> 0);
> > +    rtx zero = aarch64_gen_shareable_zero (<MODE>mode);
> > +    emit_insn (gen_aarch64_zip1<mode> (res, operands[1], zero));
> > +    DONE;
> > +  }
> >    [(set_attr "type" "neon_shift_imm_long")]
> >  )
> 
> I think we should just remove the pattern instead, and update users
> to generate an extension of the 64-bit lowpart.  That form is now the
> canonical one, so the current aarch64_simd_vec_unpack<su>_lo_<mode>
> pattern hard-codes unsimplified rtl.
> 

Ok, but then it's very weird to have _hi still.  The reason for these 
indirections
seems to be to model the hi/lo split for RTL simplifications. 

The aarch64_simd_vec_unpack<su>_lo models a vec_select as well.  This RTL is
different from the one it gets rewritten to (dropping the vec_select).

So dropping it means we lose this, and I guess maybe fold-rtx does it now?
It's not immediately clear to me that dropping the additional RTL for recog
is beneficially.

Thanks,
Tamar

> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index
> ae7e21d90b2aeec51b7626471ccf7f036fa9b3db..6f49d1482042efabedbe723aa
> 59ecf129b84f4ad 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -23159,7 +23159,8 @@ aarch64_gen_shareable_zero (machine_mode
> mode)
> >     to split without that restriction and instead recombine shared zeros
> >     if they turn out not to be worthwhile.  This would allow splits in
> >     single-block functions and would also cope more naturally with
> > -   rematerialization.  */
> > +   rematerialization.  The downside of not doing this is that we lose the
> > +   optimizations for vector epilogues as well.  */
> >
> >  bool
> >  aarch64_split_simd_shift_p (rtx_insn *insn)
> 
> This part is ok.  late-combine was supposed to help with this,
> but we might need to update the propagation code so that it handle
> changes in mode.
> 
> Thanks,
> Richard

Reply via email to