https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092

--- Comment #2 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
To demonstrate the idea, here is a simple example to make you easier understand
the idea:

https://godbolt.org/z/Gxzjv48Ec

#include "riscv_vector.h"

void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n, int
cond, int avl) {
    size_t vl = __riscv_vsetvl_e16mf2(avl >> 2);
    vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl);
    vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl);
    vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl);
    __riscv_vse32_v_i32m1(out, c, vl);
}

LLVM:

        srai    a4, a6, 2
        vsetvli zero, a4, e16, mf2, ta, ma
        vle32.v v8, (a0)
        vsetvli zero, zero, e32, m1, tu, ma
        vle32.v v8, (a1)
        vle32.v v8, (a2)
        vse32.v v8, (a3)
        ret

LLVM is generating the naive code according to the intrinsics,
as you said, the first vsetvli keep e16mf2 unchanged.

Here is the codgen of GCC:
GCC:

        srai    a6,a6,2
        vsetvli a6,a6,e32,m1,tu,ma
        vle32.v v1,0(a0)
        vle32.v v1,0(a1)
        vle32.v v1,0(a2)
        vse32.v v1,0(a3)
        ret

since e16 mf2 is same ratio e32 m1, so we change first vsetvl from e16 mf2 into
e32 m1 TU. 

Then we can eliminate the second vsetvl

That is we call "local fusion" here.

For the case you mentioned is "global fusion" But they are the same thing.

Fuse vsetvl according to RVV ISA.

So, the example you mention, GCC is generating correct codes.

Reply via email to