Spencer Abson <spencer.ab...@arm.com> writes:
> @@ -27292,10 +27291,16 @@ aarch64_emit_sve_invert_fp_cond (rtx target, 
> rtx_code code, rtx pred,
>  void
>  aarch64_expand_sve_vec_cmp_float (rtx target, rtx_code code, rtx op0, rtx 
> op1)
>  {
> -  machine_mode pred_mode = GET_MODE (target);
>    machine_mode data_mode = GET_MODE (op0);
> +  rtx pred = aarch64_sve_fp_pred (data_mode, nullptr);
>  
> -  rtx ptrue = aarch64_ptrue_reg (pred_mode);
> +  /* The governing and destination modes.  */
> +  machine_mode pred_mode = GET_MODE (pred);
> +  machine_mode target_mode = GET_MODE (target);
> +
> +  /* Also determines SVE_KNOWN_PTRUE, since an unnatural GP from
> +     sve_fp_pred would disable part of the operation.   */
> +  bool natural_p = pred_mode == target_mode;

I'm not sure about calling this "unnatural".  The SVE predicate system
was designed with this kind of flexibility in mind.  How about:

  /* For partial vector modes, the choice of predicate mode depends
     on whether we need to suppress exceptions for inactive elements.
     If we do need to suppress exceptions, the predicate mode matches
     the element size rather than the container size and the predicate
     marks the upper bits in each container as inactive.  The predicate
     is then a ptrue wrt target_mode but not wrt pred_mode.  It is the
     ptrueness wrt pred_mode that matters here.

     If we don't need to suppress exceptions, the predicate mode matches
     the container size, pred_mode == target_mode, and the predicate is
     thus a ptrue wrt both target_mode and pred_mode.  */
  bool known_ptrue_p = pred_mode == target_mode;

There again, maybe my comment makes no sense to anyone other than me,
so please do push back if you have a better suggestion!

>    switch (code)
>      {
>      case UNORDERED:
> [...]
> @@ -27333,11 +27338,21 @@ aarch64_expand_sve_vec_cmp_float (rtx target, 
> rtx_code code, rtx op0, rtx op1)
>      case UNGE:
>        if (flag_trapping_math)
>       {
> -       /* Work out which elements are ordered.  */
> -       rtx ordered = gen_reg_rtx (pred_mode);
>         op1 = force_reg (data_mode, op1);
> -       aarch64_emit_sve_invert_fp_cond (ordered, UNORDERED,
> -                                        ptrue, true, op0, op1);
> +
> +       /* Work out which elements are unordered.  */
> +       rtx uo_tmp = gen_reg_rtx (target_mode);
> +       aarch64_emit_sve_fp_cond (uo_tmp, UNORDERED, pred, natural_p,
> +                                 op0, op1);
> +
> +       /* Invert the result.  Use PRED again to maintain the intended
> +          trapping behavior.  */
> +       if (!natural_p)
> +         uo_tmp = gen_lowpart (pred_mode, uo_tmp);

The !natural_p test isn't necessary here, and IMO it's slightly easier
to follow the code without it.  The lowpart is a natural component of
the following XOR/NOT operation and is necessary to make the operation
well-typed.

> +
> +       rtx ordered = gen_reg_rtx (pred_mode);
> +       emit_insn (gen_aarch64_pred_z (XOR, pred_mode,
> +                                      ordered, pred, pred, uo_tmp));

Although the underlying instruction is an EOR, (and (xor a b) a) isn't
canonical rtl: it should be folded to (and (not b) a) instead.

So I think we should rename:

;; Predicated predicate inverse.
(define_insn "*one_cmpl<mode>3"
  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
        (and:PRED_ALL
          (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa"))
          (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
  "TARGET_SVE"
  "not\t%0.b, %1/z, %2.b"
)

to "@aarch64_pred_one_cmpl<mode>_z" and use gen_aarch64_pred_one_compl_z
here.  (Not the most elegant instruction name, but I suppose we should
be consistent...)

This will need updates to the testcase to match NOT rather than EOR.

OK with those changes, thanks.

Richard

Reply via email to