Re: [RFC][PR88838][SVE] Use 32-bit WHILELO in LP64 mode

Richard Sandiford Mon, 03 Jun 2019 02:09:20 -0700

Kugan Vivekanandarajah <kugan.vivekanandara...@linaro.org> writes:
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index b3fae5b..ad838dd 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -415,6 +415,7 @@ vect_set_loop_masks_directly (struct loop *loop, 
> loop_vec_info loop_vinfo,
>                             bool might_wrap_p)
>  {
>    tree compare_type = LOOP_VINFO_MASK_COMPARE_TYPE (loop_vinfo);
> +  tree iv_type = LOOP_VINFO_MASK_IV_TYPE (loop_vinfo);
>    tree mask_type = rgm->mask_type;
>    unsigned int nscalars_per_iter = rgm->max_nscalars_per_iter;
>    poly_uint64 nscalars_per_mask = TYPE_VECTOR_SUBPARTS (mask_type);
> @@ -445,11 +446,16 @@ vect_set_loop_masks_directly (struct loop *loop, 
> loop_vec_info loop_vinfo,
>    tree index_before_incr, index_after_incr;
>    gimple_stmt_iterator incr_gsi;
>    bool insert_after;
> -  tree zero_index = build_int_cst (compare_type, 0);
>    standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> -  create_iv (zero_index, nscalars_step, NULL_TREE, loop, &incr_gsi,
> +
> +  tree zero_index = build_int_cst (iv_type, 0);
> +  tree step = build_int_cst (iv_type,
> +                          LOOP_VINFO_VECT_FACTOR (loop_vinfo));
> +  /* Creating IV of iv_type.  */


s/Creating/Create/

> +  create_iv (zero_index, step, NULL_TREE, loop, &incr_gsi,
>            insert_after, &index_before_incr, &index_after_incr);
>  
> +  zero_index = build_int_cst (compare_type, 0);
>    tree test_index, test_limit, first_limit;
>    gimple_stmt_iterator *test_gsi;
>    if (might_wrap_p)
> [...]
> @@ -1066,11 +1077,17 @@ vect_verify_full_masking (loop_vec_info loop_vinfo)
>         if (this_type
>             && can_produce_all_loop_masks_p (loop_vinfo, this_type))
>           {
> -           /* Although we could stop as soon as we find a valid mode,
> -              it's often better to continue until we hit Pmode, since the
> +           /* See whether zero-based IV would ever generate all-false masks
> +              before wrapping around.  */
> +           bool might_wrap_p = (iv_precision > cmp_bits);
> +           /* Stop as soon as we find a valid mode.  If we decided to use
> +              cmp_type which is less than Pmode precision, it is often better
> +              to use iv_type corresponding to Pmode, since the
>                operands to the WHILE are more likely to be reusable in
> -              address calculations.  */
> -           cmp_type = this_type;
> +              address calculations in this case.  */

We're not stopping as soon as we find a valid mode though.  Any type
that satisfies the if condition above is valid, but we pick wider
cmp_types and iv_types for optimisation reasons.  How about:

              /* Although we could stop as soon as we find a valid mode,
                 there are at least two reasons why that's not always the
                 best choice:

                 - An IV that's Pmode or wider is more likely to be reusable
                   in address calculations than an IV that's narrower than
                   Pmode.

                 - Doing the comparison in IV_PRECISION or wider allows
                   a natural 0-based IV, whereas using a narrower comparison
                   type requires mitigations against wrap-around.

                 Conversely, if the IV limit is variable, doing the comparison
                 in a wider type than the original type can introduce
                 unnecessary extensions, so picking the widest valid mode
                 is not always a good choice either.

                 Here we prefer the first IV type that's Pmode or wider,
                 and the first comparison type that's IV_PRECISION or wider.
                 (The comparison type must be no wider than the IV type,
                 to avoid extensions in the vector loop.)

                 ??? We might want to try continuing beyond Pmode for ILP32
                 targets if CMP_BITS < IV_PRECISION.  */

> +           iv_type = this_type;
> +           if (!cmp_type || iv_precision > TYPE_PRECISION (cmp_type))
> +             cmp_type = this_type;
>             if (cmp_bits >= GET_MODE_BITSIZE (Pmode))
>               break;
>           }

> [...]
> @@ -9014,3 +9032,45 @@ optimize_mask_stores (struct loop *loop)
>        add_phi_arg (phi, gimple_vuse (last_store), e, UNKNOWN_LOCATION);
>      }
>  }
> +
> +/* Decide whether it is possible to use a zero-based induction variable
> +   when vectorizing LOOP_VINFO with a fully-masked loop.  If it is,
> +   return the value that the induction variable must be able to hold
> +   in order to ensure that the loop ends with an all-false mask.
> +   Return -1 otherwise.  */
> +widest_int
> +vect_iv_limit_for_full_masking (loop_vec_info loop_vinfo)
> +{
> +  tree niters_skip = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo);
> +  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  unsigned HOST_WIDE_INT max_vf = vect_max_vf (loop_vinfo);
> +
> +  /* Now calculate the value that the induction variable must be able

s/Now calculate/Calculate/

since this comment is no longer following on from earlier code.

OK with those changes, thanks.

Richard

Re: [RFC][PR88838][SVE] Use 32-bit WHILELO in LP64 mode

Reply via email to