Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector

Christophe Lyon via Gcc-patches Thu, 05 Nov 2020 12:08:16 -0800

On Tue, 3 Nov 2020 at 12:17, Dennis Zhang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi Richard,
>
> On 10/30/20 2:07 PM, Richard Sandiford wrote:
> > Dennis Zhang <dennis.zh...@arm.com> writes:
> >> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> >> b/gcc/config/aarch64/aarch64-simd-builtins.def
> >> index 332a0b6b1ea..39ebb776d1d 100644
> >> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> >> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> >> @@ -719,6 +719,9 @@
> >>     VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
> >>     VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
> >>
> >> +  /* Implemented by aarch64_vget_halfv8bf.  */
> >> +  VAR1 (GETREG, vget_half, 0, ALL, v8bf)
> >
> > This should be AUTO_FP, since it doesn't have any side-effects.
> > (As before, we should probably rename the flag, but that's separate work.)
> >
> >> +
> >>     /* Implemented by aarch64_simd_<sur>mmlav16qi.  */
> >>     VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
> >>     VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
> >> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> >> b/gcc/config/aarch64/aarch64-simd.md
> >> index 9f0e2bd1e6f..f62c52ca327 100644
> >> --- a/gcc/config/aarch64/aarch64-simd.md
> >> +++ b/gcc/config/aarch64/aarch64-simd.md
> >> @@ -7159,6 +7159,19 @@
> >>     [(set_attr "type" "neon_dot<VDQSF:q>")]
> >>   )
> >>
> >> +;; vget_low/high_bf16
> >> +(define_expand "aarch64_vget_halfv8bf"
> >> +  [(match_operand:V4BF 0 "register_operand")
> >> +   (match_operand:V8BF 1 "register_operand")
> >> +   (match_operand:SI 2 "aarch64_zero_or_1")]
> >> +  "TARGET_BF16_SIMD"
> >> +{
> >> +  int hbase = INTVAL (operands[2]);
> >> +  rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1);
> >
> > I think this needs to be:
> >
> >    aarch64_simd_vect_par_cnst_half
> >
> > instead.  The issue is that on big-endian targets, GCC assumes vector
> > lane 0 is in the high part of the register, whereas for AArch64 it's
> > always in the low part of the register.  So we convert from AArch64
> > numbering to GCC numbering when generating the rtx and then take
> > endianness into account when matching the rtx later.
> >
> > It would be good to have -mbig-endian tests that make sure we generate
> > the right instruction for each function (i.e. we get them the right way
> > round).  I guess it would be good to test that for little-endian too.
> >
>
> I've updated the expander using aarch64_simd_vect_par_cnst_half.
> And the expander is divided into two for getting low and high half
> seperately.
> It's tested for aarch64-none-linux-gnu and aarch64_be-none-linux-gnu
> targets with new tests including -mbig-endian option.
>


Hi,

When testing with a cross x86_64 -> aarch64-none-linux-gnu, the new
big-endian test fails:
FAIL: gcc.target/aarch64/advsimd-intrinsics/bf16_get-be.c   -O0  (test
for excess errors)
Excess errors:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-aarch64-none-linux-gnu/usr/include/gnu/stubs.h:11:11:
fatal error: gnu/stubs-lp64_be.h: No such file or directory
compilation terminated.

What am I missing, since it works for you?

Thanks

Christophe

> >> +  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel));
> >> +  DONE;
> >> +})
> >> +
> >>   ;; bfmmla
> >>   (define_insn "aarch64_bfmmlaqv4sf"
> >>     [(set (match_operand:V4SF 0 "register_operand" "=w")
> >> diff --git a/gcc/config/aarch64/predicates.md 
> >> b/gcc/config/aarch64/predicates.md
> >> index 215fcec5955..0c8bc2b0c73 100644
> >> --- a/gcc/config/aarch64/predicates.md
> >> +++ b/gcc/config/aarch64/predicates.md
> >> @@ -84,6 +84,10 @@
> >>               (ior (match_test "op == constm1_rtx")
> >>                    (match_test "op == const1_rtx"))))))
> >>
> >> +(define_predicate "aarch64_zero_or_1"
> >> +  (and (match_code "const_int")
> >> +       (match_test "op == const0_rtx || op == const1_rtx")))
> >
> > zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one.
> > But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1,
> > so let's keep it as-is.
> >
>
> This predicate is removed since there is no need of the imm operand in
> the new expanders.
>
> Thanks for the reviews.
> Is it OK for trunk now?
>
> Cheers
> Dennis
>
>

Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector

Reply via email to