Robin Dapp <rdapp....@gmail.com> writes:
> This adds zero else operands to masked loads and their intrinsics.
> I needed to adjust more than initially thought because we rely on
> combine for several instructions and a change in a "base" pattern
> needs to propagate to all those.

Looks less invasive than I'd feared though -- nice!

> For the lack of a better idea I used a function call property to specify
> whether a builtin needs an else operand or not.  Somebody with better
> knowledge of the aarch64 target can surely improve that.

Yeah, those flags are really for source-level/gimple-level attributes.
Would it work to pass a new parameter to use_contiguous_load instead?

> [...]
> @@ -1505,10 +1506,16 @@ public:
>    {
>      insn_code icode;
>      if (e.vectors_per_tuple () == 1)
> -      icode = convert_optab_handler (maskload_optab,
> -                                  e.vector_mode (0), e.gp_mode (0));
> +      {
> +     icode = convert_optab_handler (maskload_optab,
> +                                    e.vector_mode (0), e.gp_mode (0));
> +     e.args.quick_push (CONST0_RTX (e.vector_mode (0)));
> +      }
>      else
> -      icode = code_for_aarch64 (UNSPEC_LD1_COUNT, e.tuple_mode (0));
> +      {
> +     icode = code_for_aarch64 (UNSPEC_LD1_COUNT, e.tuple_mode (0));
> +     e.args.quick_push (CONST0_RTX (e.tuple_mode (0)));
> +      }
>      return e.use_contiguous_load_insn (icode);
>    }
>  };

For the record, I don't think we strictly need the zeros on LD1_COUNT
and LD1NT_COUNT.  But I agree it's probably better to add them anyway,
for consistency.

(So please keep this part of the patch.  Just saying the above to show
that I'd thought about it.)

> @@ -1335,6 +1340,27 @@ (define_insn "vec_mask_load_lanes<mode><vsingle>"
>  
>  ;; Predicated load and extend, with 8 elements per 128-bit block.
>  (define_insn_and_rewrite 
> "@aarch64_load<SVE_PRED_LOAD:pred_load>_<ANY_EXTEND:optab><SVE_HSDI:mode><SVE_PARTIAL_I:mode>"
> +  [(set (match_operand:SVE_HSDI 0 "register_operand" "=w")
> +     (unspec:SVE_HSDI
> +       [(match_operand:<SVE_HSDI:VPRED> 3 "general_operand" "UplDnm")
> +        (ANY_EXTEND:SVE_HSDI
> +          (unspec:SVE_PARTIAL_I
> +            [(match_operand:<SVE_PARTIAL_I:VPRED> 2 "register_operand" "Upl")
> +             (match_operand:SVE_PARTIAL_I 1 "memory_operand" "m")
> +             (match_operand:SVE_PARTIAL_I 4 "aarch64_maskload_else_operand")]
> +            SVE_PRED_LOAD))]
> +       UNSPEC_PRED_X))]
> +  "TARGET_SVE && (~<SVE_HSDI:narrower_mask> & <SVE_PARTIAL_I:self_mask>) == 
> 0"
> +  "ld1<ANY_EXTEND:s><SVE_PARTIAL_I:Vesize>\t%0.<SVE_HSDI:Vctype>, %2/z, %1"
> +  "&& !CONSTANT_P (operands[3])"
> +  {
> +    operands[3] = CONSTM1_RTX (<SVE_HSDI:VPRED>mode);
> +  }
> +)
> +
> +;; Same as above without the maskload_else_operand to still allow combine to
> +;; match a sign-extended pred_mov pattern.
> +(define_insn_and_rewrite 
> "*aarch64_load<SVE_PRED_LOAD:pred_load>_<ANY_EXTEND:optab>_mov<SVE_HSDI:mode><SVE_PARTIAL_I:mode>"
>    [(set (match_operand:SVE_HSDI 0 "register_operand" "=w")
>       (unspec:SVE_HSDI
>         [(match_operand:<SVE_HSDI:VPRED> 3 "general_operand" "UplDnm")

Splitting the patterns is the right thing to do, but it also makes
SVE_PRED_LOAD redundant.  The pattern above with the else operand
should use UNSPEC_LD1_SVE in place of SVE_PRED_LOAD.  The version
without should use UNSPEC_PRED_X (and I think can be an unnamed pattern,
starting with "*").

This would make SVE_PRED_LOAD and pred_load redundant, so they can
be removed.  The caller in svld1_extend_impl would no longer pass
UNSPEC_LD1_SVE.

Sorry about the churn.  Matching the load and move patterns in one go
seemed like a nice bit of factoring at the time, but this patch makes
it look like a factoring too far.

Otherwise it looks good.  Thanks for doing this.

Richard

Reply via email to