> -----Original Message----- > From: Richard Sandiford <richard.sandif...@arm.com> > Sent: Thursday, July 4, 2024 12:46 PM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw > <richard.earns...@arm.com>; Marcus Shawcroft > <marcus.shawcr...@arm.com>; ktkac...@gcc.gnu.org > Subject: Re: [PATCH 1/2]AArch64: make aarch64_simd_vec_unpack<su>_lo_/_hi_ > consistent. > > Tamar Christina <tamar.christ...@arm.com> writes: > > Hi All, > > > > The fix for PR18127 reworked the uxtl to zip optimization. > > In doing so it undid the changes in aarch64_simd_vec_unpack<su>_lo_ and this > now > > no longer matches aarch64_simd_vec_unpack<su>_hi_. It still works because > the > > RTL generated by aarch64_simd_vec_unpack<su>_lo_ overlaps with the general > zero > > extend RTL and so because that one is listed before the lo pattern recog > > picks > > it instead. > > > > This just makes aarch64_simd_vec_unpack<su>_lo_ mirror > > aarch64_simd_vec_unpack<su>_hi_ for consistency and so we're not relying on > the > > order of patterns. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-simd.md > > (aarch64_simd_vec_unpack<su>_lo_<mode>): Add same split as > > aarch64_simd_vec_unpack<su>_hi_. > > * config/aarch64/aarch64.cc (aarch64_gen_shareable_zero): Update > > comment. > > > > --- > > > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > > index > 01b084d8ccb53fc0f1c7b0dd8f23546c331f020a..d4026cbf0b66995104e8e40ca > 294faaf8d5ba847 100644 > > --- a/gcc/config/aarch64/aarch64-simd.md > > +++ b/gcc/config/aarch64/aarch64-simd.md > > @@ -1904,7 +1904,7 @@ (define_insn > "*aarch64_<srn_op>topbits_shuffle<mode>_be" > > > > ;; Widening operations. > > > > -(define_insn "aarch64_simd_vec_unpack<su>_lo_<mode>" > > +(define_insn_and_split "aarch64_simd_vec_unpack<su>_lo_<mode>" > > [(set (match_operand:<VWIDE> 0 "register_operand" "=w") > > (ANY_EXTEND:<VWIDE> (vec_select:<VHALF> > > (match_operand:VQW 1 "register_operand" "w") > > @@ -1912,6 +1912,19 @@ (define_insn > "aarch64_simd_vec_unpack<su>_lo_<mode>" > > )))] > > "TARGET_SIMD" > > "<su>xtl\t%0.<Vwtype>, %1.<Vhalftype>" > > + "&& <CODE> == ZERO_EXTEND > > + && aarch64_split_simd_shift_p (insn)" > > + [(const_int 0)] > > + { > > + /* On many cores, it is cheaper to implement UXTL using a ZIP1 with > > zero, > > + provided that the cost of the zero can be amortized over several > > + operations. We'll later recombine the zero and zip if there are > > + not sufficient uses of the zero to make the split worthwhile. */ > > + rtx res = simplify_gen_subreg (<MODE>mode, operands[0], <VWIDE>mode, > 0); > > + rtx zero = aarch64_gen_shareable_zero (<MODE>mode); > > + emit_insn (gen_aarch64_zip1<mode> (res, operands[1], zero)); > > + DONE; > > + } > > [(set_attr "type" "neon_shift_imm_long")] > > ) > > I think we should just remove the pattern instead, and update users > to generate an extension of the 64-bit lowpart. That form is now the > canonical one, so the current aarch64_simd_vec_unpack<su>_lo_<mode> > pattern hard-codes unsimplified rtl. >
Ok, but then it's very weird to have _hi still. The reason for these indirections seems to be to model the hi/lo split for RTL simplifications. The aarch64_simd_vec_unpack<su>_lo models a vec_select as well. This RTL is different from the one it gets rewritten to (dropping the vec_select). So dropping it means we lose this, and I guess maybe fold-rtx does it now? It's not immediately clear to me that dropping the additional RTL for recog is beneficially. Thanks, Tamar > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > > index > ae7e21d90b2aeec51b7626471ccf7f036fa9b3db..6f49d1482042efabedbe723aa > 59ecf129b84f4ad 100644 > > --- a/gcc/config/aarch64/aarch64.cc > > +++ b/gcc/config/aarch64/aarch64.cc > > @@ -23159,7 +23159,8 @@ aarch64_gen_shareable_zero (machine_mode > mode) > > to split without that restriction and instead recombine shared zeros > > if they turn out not to be worthwhile. This would allow splits in > > single-block functions and would also cope more naturally with > > - rematerialization. */ > > + rematerialization. The downside of not doing this is that we lose the > > + optimizations for vector epilogues as well. */ > > > > bool > > aarch64_split_simd_shift_p (rtx_insn *insn) > > This part is ok. late-combine was supposed to help with this, > but we might need to update the propagation code so that it handle > changes in mode. > > Thanks, > Richard