On Mon, 9 Oct 2023, Robin Dapp wrote:

> > Hmm, the function is called at transform time so this shouldn't help
> > avoiding the ICE.  I expected we refuse to vectorize _any_ reduction
> > when sign dependent rounding is in effect?  OTOH maybe sign-dependent
> > rounding is OK but only when we use a unconditional fold-left
> > (so a loop mask from fully masking is OK but not an original COND_ADD?).
> 
> So we currently only disable the use of partial vectors
> 
>       else if (reduction_type == FOLD_LEFT_REDUCTION
>              && reduc_fn == IFN_LAST

aarch64 probably chokes because reduc_fn is not IFN_LAST.

>              && FLOAT_TYPE_P (vectype_in)
>              && HONOR_SIGNED_ZEROS (vectype_in)

so with your change we'd support signed zeros correctly.

>              && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
>       {
>         if (dump_enabled_p ())
>           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                            "can't operate on partial vectors because"
>                            " signed zeros cannot be preserved.\n");
>         LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> 
> which is inside a LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P block.
> 
> For the fully masked case we continue (and then fail the assertion
> on aarch64 at transform time).
> 
> I didn't get why that case is ok, though?  We still merge the initial
> definition with the identity/neutral op (i.e. possibly -0.0) based on
> the loop mask.  Is that different to partial masking?

I think the main point with my earlier change is that without
native support for a fold-left reduction (like on x86) we get

 ops = mask ? ops : neutral;
 acc += ops[0];
 acc += ops[1];
 ...

so we wouldn't use a COND_ADD but add neutral elements for masked
elements.  That's OK for signed zeros after your change (great)
but not OK for sign dependent rounding (because we can't decide on
the sign of the neutral zero then).

For the case of using an internal function, thus direct target support,
it should be OK to have sign-dependent rounding if we can use
the masked-fold-left reduction op.  As we do

      /* On the first iteration the input is simply the scalar phi
         result, and for subsequent iterations it is the output of
         the preceding operation.  */
      if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
        {
          if (mask && len && mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
            new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, 
reduc_var,
                                                   def0, mask, len, bias);
          else if (mask && mask_reduc_fn == IFN_MASK_FOLD_LEFT_PLUS)
            new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, 
reduc_var,
                                                   def0, mask);
          else
            new_stmt = gimple_build_call_internal (reduc_fn, 2, reduc_var,
                                                   def0);

the last case should be able to assert that 
!HONOR_SIGN_DEPENDENT_ROUNDING (also the reduc_fn == IFN_LAST case).

The quoted condition above should change to drop the HONOR_SIGNED_ZEROS
condition and the reduc_fn == IFN_LAST should change, maybe to
internal_fn_mask_index (reduc_fn) == -1?

Richard.

Reply via email to