Sorry for the slow review.

I don't know the IV-related parts well enough to review those properly,
but they looked reasonable to me.  Hopefully Richi can comment.

I'm curious though.  For:

> +  tree step = vect_dr_behavior (vinfo, dr_info)->step;
> +
> +  [...]
> +  poly_uint64 bytesize = GET_MODE_SIZE (element_mode (aggr_type));
> +  /* Since the outcome of .SELECT_VL is element size, we should adjust
> +     it into bytesize so that it can be used in address pointer variable
> +     amount IVs adjustment.  */
> +  tree tmp = fold_build2 (MULT_EXPR, len_type, loop_len,
> +                       build_int_cst (len_type, bytesize));
> +  if (tree_int_cst_sgn (step) == -1)
> +    tmp = fold_build1 (NEGATE_EXPR, len_type, tmp);

Could you not just multiply loop_len by step, probably written as:

  build_int_cst (len_type, wi::to_widest (step))

avoiding the NEGATE_EXPR and bytesize calculation?  step should
represent the step of the original scalar IV, so doing that feels
more direct.

The loop-control bits look good to me apart from one hunk:

> @@ -2737,6 +2738,14 @@ start_over:
>                       LOOP_VINFO_VECT_FACTOR (loop_vinfo))))
>      LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
>  
> +  /* If we're using decrement IV and SELECT_VL is supported by the target.
> +     Use output of SELECT_VL to adjust IV of loop control and data reference.
> +     Note: We only use SELECT_VL on single-rgroup control.  */
> +  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
> +      && LOOP_VINFO_LENS (loop_vinfo).length () == 1
> +      && !slp)
> +    LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
> +
>    /* If we're vectorizing an epilogue loop, the vectorized loop either needs
>       to be able to handle fewer than VF scalars, or needs to have a lower VF
>       than the main loop.  */

This test also needs to check that the target implements the select_vl
optab for the chosen iv type.  You can check that using
direct_internal_fn_supported_p.

We should also check that LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1,
since the IV update multiplies by the size in bytes.

I think it would be worth saying in more detail why we only use SELECT_VL
for single rgroups.  I assume the reason is to simplify the pointer IV
updates.  Is that right?

That is: the multiple length controls that are currently generated from
a MIN_EXPR IV should also work with a SELECT_VL IV.  The difficulty is
that an rgroup that controls N vector loads (say) would need N pointer
updates by variable amounts.  But I'm not 100% sure whether we're
avoiding that situation because it's difficult to code, or because
it's inefficient.  Or maybe we're avoiding it because it doesn't
fit well with the later RVV vsetvl pass.

Thanks,
Richard

Reply via email to