https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809

JuzheZhong <juzhe.zhong at rivai dot ai> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |juzhe.zhong at rivai dot ai

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
For missed peephole optimization, I already noticed it long time ago,
and I have filed PR:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113014

Such issue will gone after Richard Standiford @arm merged late-combine PASS in
GCC 15.

Also, GCC support dynamic LMUL optimization with -mrvv-max-lmul=dynamic:

https://godbolt.org/z/646nYoKbv

ASM:

count_chars(char const*, unsigned long, char):
        beq     a1,zero,.L4
        vsetvli a4,zero,e8,m1,ta,ma
        vmv.v.x v1,a2
        vsetvli zero,zero,e64,m8,ta,ma
        vmv.v.i v8,0
.L3:
        vsetvli a5,a1,e8,m1,ta,ma
        vle8.v  v0,0(a0)
        sub     a1,a1,a5
        add     a0,a0,a5
        vmseq.vv        v0,v0,v1
        vsetvli zero,zero,e64,m8,tu,mu
        vadd.vi v8,v8,1,v0.t
        bne     a1,zero,.L3
        vsetvli a5,zero,e64,m8,ta,ma
        li      a4,0
        vmv.s.x v1,a4
        vredsum.vs      v8,v8,v1
        vmv.x.s a0,v8
        ret
.L4:
        li      a0,0
        ret

GCC picks LMUL = 8, since it doesn't cause additional register spillings
according to the program register pressure.

Reply via email to