Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector

Dennis Zhang via Gcc-patches Tue, 03 Nov 2020 09:01:28 -0800

On 11/3/20 2:05 PM, Richard Sandiford wrote:

Dennis Zhang <dennis.zh...@arm.com> writes:

Hi Richard,


On 10/30/20 2:07 PM, Richard Sandiford wrote:

Dennis Zhang <dennis.zh...@arm.com> writes:

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 332a0b6b1ea..39ebb776d1d 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -719,6 +719,9 @@
     VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
     VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)

+ /* Implemented by aarch64_vget_halfv8bf. */

+  VAR1 (GETREG, vget_half, 0, ALL, v8bf)


This should be AUTO_FP, since it doesn't have any side-effects.
(As before, we should probably rename the flag, but that's separate work.)

+
     /* Implemented by aarch64_simd_<sur>mmlav16qi.  */
     VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
     VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 9f0e2bd1e6f..f62c52ca327 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7159,6 +7159,19 @@
     [(set_attr "type" "neon_dot<VDQSF:q>")]
   )

+;; vget_low/high_bf16

+(define_expand "aarch64_vget_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")
+   (match_operand:SI 2 "aarch64_zero_or_1")]
+  "TARGET_BF16_SIMD"
+{
+  int hbase = INTVAL (operands[2]);
+  rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1);


I think this needs to be:

    aarch64_simd_vect_par_cnst_half

instead.  The issue is that on big-endian targets, GCC assumes vector
lane 0 is in the high part of the register, whereas for AArch64 it's
always in the low part of the register.  So we convert from AArch64
numbering to GCC numbering when generating the rtx and then take
endianness into account when matching the rtx later.

It would be good to have -mbig-endian tests that make sure we generate
the right instruction for each function (i.e. we get them the right way
round).  I guess it would be good to test that for little-endian too.


I've updated the expander using aarch64_simd_vect_par_cnst_half.
And the expander is divided into two for getting low and high half
seperately.
It's tested for aarch64-none-linux-gnu and aarch64_be-none-linux-gnu
targets with new tests including -mbig-endian option.

+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel));
+  DONE;
+})
+
   ;; bfmmla
   (define_insn "aarch64_bfmmlaqv4sf"
     [(set (match_operand:V4SF 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 215fcec5955..0c8bc2b0c73 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -84,6 +84,10 @@
                 (ior (match_test "op == constm1_rtx")
                      (match_test "op == const1_rtx"))))))

+(define_predicate "aarch64_zero_or_1"

+  (and (match_code "const_int")
+       (match_test "op == const0_rtx || op == const1_rtx")))


zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one.
But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1,
so let's keep it as-is.


This predicate is removed since there is no need of the imm operand in
the new expanders.

Thanks for the reviews.
Is it OK for trunk now?


Looks good.  OK for trunk and branches, thanks.

Richard


Thanks for approval, Richard!
This patch is committed at 3553c658533e430b232997bdfd97faf6606fb102

Bests
Dennis

Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector

Reply via email to