Hi Craig,

thanks for working on this, it has been on my TODO list for a while.

In general this looks reasonable to me.

> +       poly_uint64 mode_units;
>         /* Find the mode to use for the copy inside the loop - or the
>            sole copy, if there is no loop.  */
>         if (!need_loop)
> @@ -1152,12 +1166,12 @@ expand_block_move (rtx dst_in, rtx src_in, rtx 
> length_in)
>                pointless.
>                Still, by choosing a lower LMUL factor that still allows
>                an entire transfer, we can reduce register pressure.  */
> -           for (unsigned lmul = 1; lmul <= 4; lmul <<= 1)
> -             if (length * BITS_PER_UNIT <= TARGET_MIN_VLEN * lmul
> -                 && multiple_p (BYTES_PER_RISCV_VECTOR * lmul, potential_ew)
> +           for (unsigned lmul = 1; lmul < TARGET_MAX_LMUL; lmul <<= 1)
> +             if (known_le (length * BITS_PER_UNIT, TARGET_MIN_VLEN * lmul)
> +                 && multiple_p (BYTES_PER_RISCV_VECTOR * lmul, potential_ew,
> +                                &mode_units)
>                   && (riscv_vector::get_vector_mode
> -                      (elem_mode, exact_div (BYTES_PER_RISCV_VECTOR * lmul,
> -                                  potential_ew)).exists (&vmode)))
> +                      (elem_mode, mode_units).exists (&vmode)))
>                 break;

> +/* Return the appropriate LMUL mode for MODE.  */
> +
> +opt_machine_mode
> +get_lmul_mode (scalar_mode mode, int lmul)
> +{
> +  poly_uint64 lmul_nunits;
> +  unsigned int bytes = GET_MODE_SIZE (mode);
> +  if (multiple_p (BYTES_PER_RISCV_VECTOR * lmul, bytes, &lmul_nunits))
> +    return get_vector_mode (mode, lmul_nunits);
> +  return E_VOIDmode;
> +}

I don't fully see the need for this function just for the single caller
The ask for "largest vector mode with inner mode MODE" is
common to other "string" functions as well and what we do there is

  poly_int64 nunits = exact_div
      (BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL, GET_MODE_SIZE (mode));

  machine_mode vmode;
  if (!riscv_vector::get_vector_mode (GET_MODE_INNER (mode), nunits)
         .exists (&vmode))
    gcc_unreachable ();

The natural generalization to "largest vector mode up to LMUL" is useful
in instances where we know the AVL.
So maybe you'd want to slightly enhance your function and use it for
the other instances we have?  You could probably also use it inside your
loop just above.

-- 
Regards
 Robin

Reply via email to