On Tue, 3 Nov 2020 at 12:17, Dennis Zhang via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi Richard, > > On 10/30/20 2:07 PM, Richard Sandiford wrote: > > Dennis Zhang <dennis.zh...@arm.com> writes: > >> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > >> b/gcc/config/aarch64/aarch64-simd-builtins.def > >> index 332a0b6b1ea..39ebb776d1d 100644 > >> --- a/gcc/config/aarch64/aarch64-simd-builtins.def > >> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > >> @@ -719,6 +719,9 @@ > >> VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf) > >> VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf) > >> > >> + /* Implemented by aarch64_vget_halfv8bf. */ > >> + VAR1 (GETREG, vget_half, 0, ALL, v8bf) > > > > This should be AUTO_FP, since it doesn't have any side-effects. > > (As before, we should probably rename the flag, but that's separate work.) > > > >> + > >> /* Implemented by aarch64_simd_<sur>mmlav16qi. */ > >> VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi) > >> VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi) > >> diff --git a/gcc/config/aarch64/aarch64-simd.md > >> b/gcc/config/aarch64/aarch64-simd.md > >> index 9f0e2bd1e6f..f62c52ca327 100644 > >> --- a/gcc/config/aarch64/aarch64-simd.md > >> +++ b/gcc/config/aarch64/aarch64-simd.md > >> @@ -7159,6 +7159,19 @@ > >> [(set_attr "type" "neon_dot<VDQSF:q>")] > >> ) > >> > >> +;; vget_low/high_bf16 > >> +(define_expand "aarch64_vget_halfv8bf" > >> + [(match_operand:V4BF 0 "register_operand") > >> + (match_operand:V8BF 1 "register_operand") > >> + (match_operand:SI 2 "aarch64_zero_or_1")] > >> + "TARGET_BF16_SIMD" > >> +{ > >> + int hbase = INTVAL (operands[2]); > >> + rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1); > > > > I think this needs to be: > > > > aarch64_simd_vect_par_cnst_half > > > > instead. The issue is that on big-endian targets, GCC assumes vector > > lane 0 is in the high part of the register, whereas for AArch64 it's > > always in the low part of the register. So we convert from AArch64 > > numbering to GCC numbering when generating the rtx and then take > > endianness into account when matching the rtx later. > > > > It would be good to have -mbig-endian tests that make sure we generate > > the right instruction for each function (i.e. we get them the right way > > round). I guess it would be good to test that for little-endian too. > > > > I've updated the expander using aarch64_simd_vect_par_cnst_half. > And the expander is divided into two for getting low and high half > seperately. > It's tested for aarch64-none-linux-gnu and aarch64_be-none-linux-gnu > targets with new tests including -mbig-endian option. >
Hi, When testing with a cross x86_64 -> aarch64-none-linux-gnu, the new big-endian test fails: FAIL: gcc.target/aarch64/advsimd-intrinsics/bf16_get-be.c -O0 (test for excess errors) Excess errors: /aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-aarch64-none-linux-gnu/usr/include/gnu/stubs.h:11:11: fatal error: gnu/stubs-lp64_be.h: No such file or directory compilation terminated. What am I missing, since it works for you? Thanks Christophe > >> + emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel)); > >> + DONE; > >> +}) > >> + > >> ;; bfmmla > >> (define_insn "aarch64_bfmmlaqv4sf" > >> [(set (match_operand:V4SF 0 "register_operand" "=w") > >> diff --git a/gcc/config/aarch64/predicates.md > >> b/gcc/config/aarch64/predicates.md > >> index 215fcec5955..0c8bc2b0c73 100644 > >> --- a/gcc/config/aarch64/predicates.md > >> +++ b/gcc/config/aarch64/predicates.md > >> @@ -84,6 +84,10 @@ > >> (ior (match_test "op == constm1_rtx") > >> (match_test "op == const1_rtx")))))) > >> > >> +(define_predicate "aarch64_zero_or_1" > >> + (and (match_code "const_int") > >> + (match_test "op == const0_rtx || op == const1_rtx"))) > > > > zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one. > > But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1, > > so let's keep it as-is. > > > > This predicate is removed since there is no need of the imm operand in > the new expanders. > > Thanks for the reviews. > Is it OK for trunk now? > > Cheers > Dennis > >