Dennis Zhang <dennis.zh...@arm.com> writes: > Hi Richard, > > On 10/30/20 2:07 PM, Richard Sandiford wrote: >> Dennis Zhang <dennis.zh...@arm.com> writes: >>> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def >>> b/gcc/config/aarch64/aarch64-simd-builtins.def >>> index 332a0b6b1ea..39ebb776d1d 100644 >>> --- a/gcc/config/aarch64/aarch64-simd-builtins.def >>> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def >>> @@ -719,6 +719,9 @@ >>> VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf) >>> VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf) >>> >>> + /* Implemented by aarch64_vget_halfv8bf. */ >>> + VAR1 (GETREG, vget_half, 0, ALL, v8bf) >> >> This should be AUTO_FP, since it doesn't have any side-effects. >> (As before, we should probably rename the flag, but that's separate work.) >> >>> + >>> /* Implemented by aarch64_simd_<sur>mmlav16qi. */ >>> VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi) >>> VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi) >>> diff --git a/gcc/config/aarch64/aarch64-simd.md >>> b/gcc/config/aarch64/aarch64-simd.md >>> index 9f0e2bd1e6f..f62c52ca327 100644 >>> --- a/gcc/config/aarch64/aarch64-simd.md >>> +++ b/gcc/config/aarch64/aarch64-simd.md >>> @@ -7159,6 +7159,19 @@ >>> [(set_attr "type" "neon_dot<VDQSF:q>")] >>> ) >>> >>> +;; vget_low/high_bf16 >>> +(define_expand "aarch64_vget_halfv8bf" >>> + [(match_operand:V4BF 0 "register_operand") >>> + (match_operand:V8BF 1 "register_operand") >>> + (match_operand:SI 2 "aarch64_zero_or_1")] >>> + "TARGET_BF16_SIMD" >>> +{ >>> + int hbase = INTVAL (operands[2]); >>> + rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1); >> >> I think this needs to be: >> >> aarch64_simd_vect_par_cnst_half >> >> instead. The issue is that on big-endian targets, GCC assumes vector >> lane 0 is in the high part of the register, whereas for AArch64 it's >> always in the low part of the register. So we convert from AArch64 >> numbering to GCC numbering when generating the rtx and then take >> endianness into account when matching the rtx later. >> >> It would be good to have -mbig-endian tests that make sure we generate >> the right instruction for each function (i.e. we get them the right way >> round). I guess it would be good to test that for little-endian too. >> > > I've updated the expander using aarch64_simd_vect_par_cnst_half. > And the expander is divided into two for getting low and high half > seperately. > It's tested for aarch64-none-linux-gnu and aarch64_be-none-linux-gnu > targets with new tests including -mbig-endian option. > >>> + emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel)); >>> + DONE; >>> +}) >>> + >>> ;; bfmmla >>> (define_insn "aarch64_bfmmlaqv4sf" >>> [(set (match_operand:V4SF 0 "register_operand" "=w") >>> diff --git a/gcc/config/aarch64/predicates.md >>> b/gcc/config/aarch64/predicates.md >>> index 215fcec5955..0c8bc2b0c73 100644 >>> --- a/gcc/config/aarch64/predicates.md >>> +++ b/gcc/config/aarch64/predicates.md >>> @@ -84,6 +84,10 @@ >>> (ior (match_test "op == constm1_rtx") >>> (match_test "op == const1_rtx")))))) >>> >>> +(define_predicate "aarch64_zero_or_1" >>> + (and (match_code "const_int") >>> + (match_test "op == const0_rtx || op == const1_rtx"))) >> >> zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one. >> But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1, >> so let's keep it as-is. >> > > This predicate is removed since there is no need of the imm operand in > the new expanders. > > Thanks for the reviews. > Is it OK for trunk now?
Looks good. OK for trunk and branches, thanks. Richard