Tamar Christina <tamar.christ...@arm.com> writes:
> Hi All,
>
> The fix for PR18127 reworked the uxtl to zip optimization.
> In doing so it undid the changes in aarch64_simd_vec_unpack<su>_lo_ and this 
> now
> no longer matches aarch64_simd_vec_unpack<su>_hi_.  It still works because the
> RTL generated by aarch64_simd_vec_unpack<su>_lo_ overlaps with the general 
> zero
> extend RTL and so because that one is listed before the lo pattern recog picks
> it instead.
>
> This just makes aarch64_simd_vec_unpack<su>_lo_ mirror
> aarch64_simd_vec_unpack<su>_hi_ for consistency and so we're not relying on 
> the
> order of patterns.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>       * config/aarch64/aarch64-simd.md
>       (aarch64_simd_vec_unpack<su>_lo_<mode>): Add same split as
>       aarch64_simd_vec_unpack<su>_hi_.
>       * config/aarch64/aarch64.cc (aarch64_gen_shareable_zero): Update
>       comment.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 01b084d8ccb53fc0f1c7b0dd8f23546c331f020a..d4026cbf0b66995104e8e40ca294faaf8d5ba847
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1904,7 +1904,7 @@ (define_insn "*aarch64_<srn_op>topbits_shuffle<mode>_be"
>  
>  ;; Widening operations.
>  
> -(define_insn "aarch64_simd_vec_unpack<su>_lo_<mode>"
> +(define_insn_and_split "aarch64_simd_vec_unpack<su>_lo_<mode>"
>    [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
>          (ANY_EXTEND:<VWIDE> (vec_select:<VHALF>
>                              (match_operand:VQW 1 "register_operand" "w")
> @@ -1912,6 +1912,19 @@ (define_insn "aarch64_simd_vec_unpack<su>_lo_<mode>"
>                           )))]
>    "TARGET_SIMD"
>    "<su>xtl\t%0.<Vwtype>, %1.<Vhalftype>"
> +  "&& <CODE> == ZERO_EXTEND
> +   && aarch64_split_simd_shift_p (insn)"
> +  [(const_int 0)]
> +  {
> +    /* On many cores, it is cheaper to implement UXTL using a ZIP1 with zero,
> +       provided that the cost of the zero can be amortized over several
> +       operations.  We'll later recombine the zero and zip if there are
> +       not sufficient uses of the zero to make the split worthwhile.  */
> +    rtx res = simplify_gen_subreg (<MODE>mode, operands[0], <VWIDE>mode, 0);
> +    rtx zero = aarch64_gen_shareable_zero (<MODE>mode);
> +    emit_insn (gen_aarch64_zip1<mode> (res, operands[1], zero));
> +    DONE;
> +  }
>    [(set_attr "type" "neon_shift_imm_long")]
>  )

I think we should just remove the pattern instead, and update users
to generate an extension of the 64-bit lowpart.  That form is now the
canonical one, so the current aarch64_simd_vec_unpack<su>_lo_<mode>
pattern hard-codes unsimplified rtl.

> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> ae7e21d90b2aeec51b7626471ccf7f036fa9b3db..6f49d1482042efabedbe723aa59ecf129b84f4ad
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -23159,7 +23159,8 @@ aarch64_gen_shareable_zero (machine_mode mode)
>     to split without that restriction and instead recombine shared zeros
>     if they turn out not to be worthwhile.  This would allow splits in
>     single-block functions and would also cope more naturally with
> -   rematerialization.  */
> +   rematerialization.  The downside of not doing this is that we lose the
> +   optimizations for vector epilogues as well.  */
>  
>  bool
>  aarch64_split_simd_shift_p (rtx_insn *insn)

This part is ok.  late-combine was supposed to help with this,
but we might need to update the propagation code so that it handle
changes in mode.

Thanks,
Richard

Reply via email to