Hi Craig, thanks for working on this, it has been on my TODO list for a while.
In general this looks reasonable to me. > + poly_uint64 mode_units; > /* Find the mode to use for the copy inside the loop - or the > sole copy, if there is no loop. */ > if (!need_loop) > @@ -1152,12 +1166,12 @@ expand_block_move (rtx dst_in, rtx src_in, rtx > length_in) > pointless. > Still, by choosing a lower LMUL factor that still allows > an entire transfer, we can reduce register pressure. */ > - for (unsigned lmul = 1; lmul <= 4; lmul <<= 1) > - if (length * BITS_PER_UNIT <= TARGET_MIN_VLEN * lmul > - && multiple_p (BYTES_PER_RISCV_VECTOR * lmul, potential_ew) > + for (unsigned lmul = 1; lmul < TARGET_MAX_LMUL; lmul <<= 1) > + if (known_le (length * BITS_PER_UNIT, TARGET_MIN_VLEN * lmul) > + && multiple_p (BYTES_PER_RISCV_VECTOR * lmul, potential_ew, > + &mode_units) > && (riscv_vector::get_vector_mode > - (elem_mode, exact_div (BYTES_PER_RISCV_VECTOR * lmul, > - potential_ew)).exists (&vmode))) > + (elem_mode, mode_units).exists (&vmode))) > break; > +/* Return the appropriate LMUL mode for MODE. */ > + > +opt_machine_mode > +get_lmul_mode (scalar_mode mode, int lmul) > +{ > + poly_uint64 lmul_nunits; > + unsigned int bytes = GET_MODE_SIZE (mode); > + if (multiple_p (BYTES_PER_RISCV_VECTOR * lmul, bytes, &lmul_nunits)) > + return get_vector_mode (mode, lmul_nunits); > + return E_VOIDmode; > +} I don't fully see the need for this function just for the single caller The ask for "largest vector mode with inner mode MODE" is common to other "string" functions as well and what we do there is poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL, GET_MODE_SIZE (mode)); machine_mode vmode; if (!riscv_vector::get_vector_mode (GET_MODE_INNER (mode), nunits) .exists (&vmode)) gcc_unreachable (); The natural generalization to "largest vector mode up to LMUL" is useful in instances where we know the AVL. So maybe you'd want to slightly enhance your function and use it for the other instances we have? You could probably also use it inside your loop just above. -- Regards Robin