https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #4 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Oh. I see what you mean.

I think it may not be the valid optimization.

Since the following codes:

.L3:
        vsetvli a5,a0,e32,m1,ta,ma
        slli    a4,a5,2
        vle32.v v1,0(a1)
        sub     a0,a0,a5
        vadd.vv v1,v1,v2
        vse32.v v1,0(a2)
        add     a1,a1,a4
        vsetvli a5,zero,e32,m1,ta,ma --- > seems redundant
        add     a2,a2,a4
        vadd.vv v2,v2,v4
        bne     a0,zero,.L3

Suppose the VLEN = 8 elments. a0 is 13 in the last 2 iterations.

If we remove the VLMAX vsetvl which seems redundant. We may have issues in
some hardware.

Since 13 elements, we can choose to process 6 elements int last second,
and 7 elements in the last iteration.

The VLMAX vadd.vv result is used by next iteration NOT the current iteration.
Then, the vadd.vv will generate 6 elements to the last iteration which need 7 
elements.

Then it will cause a bug. So, it is not invalid optimization...

Reply via email to