riscv: Add riscv vset{i}vli support

LIU Zhiwei Tue, 10 Sep 2024 00:06:07 -0700


On 2024/9/10 12:34, Richard Henderson wrote:

On 9/9/24 19:46, LIU Zhiwei wrote:

    lmul = type - riscv_lg2_vlenb;
    if (lmul < -3) {
        /* Host VLEN >= 1024 bits. */
        vlmul = VLMUL_M1;

I am not sure if we should use VLMUL_MF8,


Perhaps.  See below.

    } else if (lmul < 3) {
        /* 1/8 ... 1 ... 8 */
        vlmul = lmul & 7;
        lmul_eq_avl = true;
    } else {
        /* Guaranteed by Zve64x. */
        g_assert_not_reached();
    }

    avl = tcg_type_size(type) >> vsew;
    vtype = encode_vtype(true, true, vsew, vlmul);

    if (avl < 32) {
        insn = encode_i(OPC_VSETIVLI, TCG_REG_ZERO, avl, vtype);

Which may benifit here? we usually use lmul as smallest as we canfor macro ops split.


lmul is unchanged, just explicitly setting AVL as well.
The "benefit" is that AVL is visible in the disassembly,
and that we are able to discard the result.

There doesn't appear to be a down side.  Is there one?

    } else if (lmul_eq_avl) {
        /* rd != 0 and rs1 == 0 uses vlmax */
insn = encode_i(OPC_VSETVLI, TCG_REG_TMP0, TCG_REG_ZERO,vtype);


As opposed to here, where we must clobber a register.
It is a scratch reg, sure, and probably affects nothing
in any microarch which does register renaming.

    } else {
        tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, avl);
insn = encode_i(OPC_VSETVLI, TCG_REG_ZERO, TCG_REG_TMP0,vtype);
And perhaps here.

Here, lmul does *not* equal avl, and so we must set it, and because ofnon-use of VSETIVLI we also know that it does not fit in uimm5.


But here's a follow-up question regarding current micro-architectures:

  How much benefit is there from adjusting LMUL < 1, or AVL < VLMAX?

It may reduce some macro ops for LMUL < 1 than LMUL = 1. For example, onhost with 128-bit vector,


1) LMUL = 1/2, only one macro ops.

vsetivli x0, 8, e32, mf2
vadd.v.v  x2, x4, x5


2) LMUL = 1, two macro ops.

vsetivli x0, 8, e32, m1
vadd.v.v x2, x4, x5

For instance, on other hosts with 128-bit vectors, we also promisesupport for 64-bit registers, just so we can support guests which have64-bit vector operations. In existing hosts (x86, ppc, s390x,loongarch) we accept that the host instruction will operate on all128-bits; we simply ignore half of any result.
Thus the question becomes: can we minimize the number of vset*instructions by bounding minimal lmul to 1 (or whatever) and alwaysleaving avl as the full register?

I think the question we are talking about is when TCG_TYPE_V* is smallerthan vlen, should we use fraction lmul?


1) Fraction lmul leads to less macro ops. (Depend on micro-architectures).

2) LMUL = 1 leads to less vset*.

I like to use the 1), because vset*vli we are using can be fusion-edprobably.


Thanks,
Zhiwei

If so, the only vset* changes are for SEW changes, or for load/storethat are smaller than V*1REG64.
r~

Re: [PATCH v3 04/14] tcg/riscv: Add riscv vset{i}vli support

Reply via email to