Robin Dapp <rdapp....@gmail.com> writes: > This adds zero else operands to masked loads and their intrinsics. > I needed to adjust more than initially thought because we rely on > combine for several instructions and a change in a "base" pattern > needs to propagate to all those.
Looks less invasive than I'd feared though -- nice! > For the lack of a better idea I used a function call property to specify > whether a builtin needs an else operand or not. Somebody with better > knowledge of the aarch64 target can surely improve that. Yeah, those flags are really for source-level/gimple-level attributes. Would it work to pass a new parameter to use_contiguous_load instead? > [...] > @@ -1505,10 +1506,16 @@ public: > { > insn_code icode; > if (e.vectors_per_tuple () == 1) > - icode = convert_optab_handler (maskload_optab, > - e.vector_mode (0), e.gp_mode (0)); > + { > + icode = convert_optab_handler (maskload_optab, > + e.vector_mode (0), e.gp_mode (0)); > + e.args.quick_push (CONST0_RTX (e.vector_mode (0))); > + } > else > - icode = code_for_aarch64 (UNSPEC_LD1_COUNT, e.tuple_mode (0)); > + { > + icode = code_for_aarch64 (UNSPEC_LD1_COUNT, e.tuple_mode (0)); > + e.args.quick_push (CONST0_RTX (e.tuple_mode (0))); > + } > return e.use_contiguous_load_insn (icode); > } > }; For the record, I don't think we strictly need the zeros on LD1_COUNT and LD1NT_COUNT. But I agree it's probably better to add them anyway, for consistency. (So please keep this part of the patch. Just saying the above to show that I'd thought about it.) > @@ -1335,6 +1340,27 @@ (define_insn "vec_mask_load_lanes<mode><vsingle>" > > ;; Predicated load and extend, with 8 elements per 128-bit block. > (define_insn_and_rewrite > "@aarch64_load<SVE_PRED_LOAD:pred_load>_<ANY_EXTEND:optab><SVE_HSDI:mode><SVE_PARTIAL_I:mode>" > + [(set (match_operand:SVE_HSDI 0 "register_operand" "=w") > + (unspec:SVE_HSDI > + [(match_operand:<SVE_HSDI:VPRED> 3 "general_operand" "UplDnm") > + (ANY_EXTEND:SVE_HSDI > + (unspec:SVE_PARTIAL_I > + [(match_operand:<SVE_PARTIAL_I:VPRED> 2 "register_operand" "Upl") > + (match_operand:SVE_PARTIAL_I 1 "memory_operand" "m") > + (match_operand:SVE_PARTIAL_I 4 "aarch64_maskload_else_operand")] > + SVE_PRED_LOAD))] > + UNSPEC_PRED_X))] > + "TARGET_SVE && (~<SVE_HSDI:narrower_mask> & <SVE_PARTIAL_I:self_mask>) == > 0" > + "ld1<ANY_EXTEND:s><SVE_PARTIAL_I:Vesize>\t%0.<SVE_HSDI:Vctype>, %2/z, %1" > + "&& !CONSTANT_P (operands[3])" > + { > + operands[3] = CONSTM1_RTX (<SVE_HSDI:VPRED>mode); > + } > +) > + > +;; Same as above without the maskload_else_operand to still allow combine to > +;; match a sign-extended pred_mov pattern. > +(define_insn_and_rewrite > "*aarch64_load<SVE_PRED_LOAD:pred_load>_<ANY_EXTEND:optab>_mov<SVE_HSDI:mode><SVE_PARTIAL_I:mode>" > [(set (match_operand:SVE_HSDI 0 "register_operand" "=w") > (unspec:SVE_HSDI > [(match_operand:<SVE_HSDI:VPRED> 3 "general_operand" "UplDnm") Splitting the patterns is the right thing to do, but it also makes SVE_PRED_LOAD redundant. The pattern above with the else operand should use UNSPEC_LD1_SVE in place of SVE_PRED_LOAD. The version without should use UNSPEC_PRED_X (and I think can be an unnamed pattern, starting with "*"). This would make SVE_PRED_LOAD and pred_load redundant, so they can be removed. The caller in svld1_extend_impl would no longer pass UNSPEC_LD1_SVE. Sorry about the churn. Matching the load and move patterns in one go seemed like a nice bit of factoring at the time, but this patch makes it look like a factoring too far. Otherwise it looks good. Thanks for doing this. Richard