Dennis Zhang <dennis.zh...@arm.com> writes:
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 332a0b6b1ea..39ebb776d1d 100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -719,6 +719,9 @@
>    VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
>    VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
>  
> +  /* Implemented by aarch64_vget_halfv8bf.  */
> +  VAR1 (GETREG, vget_half, 0, ALL, v8bf)

This should be AUTO_FP, since it doesn't have any side-effects.
(As before, we should probably rename the flag, but that's separate work.)

> +
>    /* Implemented by aarch64_simd_<sur>mmlav16qi.  */
>    VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
>    VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 9f0e2bd1e6f..f62c52ca327 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -7159,6 +7159,19 @@
>    [(set_attr "type" "neon_dot<VDQSF:q>")]
>  )
>  
> +;; vget_low/high_bf16
> +(define_expand "aarch64_vget_halfv8bf"
> +  [(match_operand:V4BF 0 "register_operand")
> +   (match_operand:V8BF 1 "register_operand")
> +   (match_operand:SI 2 "aarch64_zero_or_1")]
> +  "TARGET_BF16_SIMD"
> +{
> +  int hbase = INTVAL (operands[2]);
> +  rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1);

I think this needs to be:

  aarch64_simd_vect_par_cnst_half

instead.  The issue is that on big-endian targets, GCC assumes vector
lane 0 is in the high part of the register, whereas for AArch64 it's
always in the low part of the register.  So we convert from AArch64
numbering to GCC numbering when generating the rtx and then take
endianness into account when matching the rtx later.

It would be good to have -mbig-endian tests that make sure we generate
the right instruction for each function (i.e. we get them the right way
round).  I guess it would be good to test that for little-endian too.

> +  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel));
> +  DONE;
> +})
> +
>  ;; bfmmla
>  (define_insn "aarch64_bfmmlaqv4sf"
>    [(set (match_operand:V4SF 0 "register_operand" "=w")
> diff --git a/gcc/config/aarch64/predicates.md 
> b/gcc/config/aarch64/predicates.md
> index 215fcec5955..0c8bc2b0c73 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -84,6 +84,10 @@
>                (ior (match_test "op == constm1_rtx")
>                     (match_test "op == const1_rtx"))))))
>  
> +(define_predicate "aarch64_zero_or_1"
> +  (and (match_code "const_int")
> +       (match_test "op == const0_rtx || op == const1_rtx")))

zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one.
But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1,
so let's keep it as-is.

Thanks,
Richard

Reply via email to