https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809
JuzheZhong <juzhe.zhong at rivai dot ai> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |juzhe.zhong at rivai dot ai --- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> --- For missed peephole optimization, I already noticed it long time ago, and I have filed PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113014 Such issue will gone after Richard Standiford @arm merged late-combine PASS in GCC 15. Also, GCC support dynamic LMUL optimization with -mrvv-max-lmul=dynamic: https://godbolt.org/z/646nYoKbv ASM: count_chars(char const*, unsigned long, char): beq a1,zero,.L4 vsetvli a4,zero,e8,m1,ta,ma vmv.v.x v1,a2 vsetvli zero,zero,e64,m8,ta,ma vmv.v.i v8,0 .L3: vsetvli a5,a1,e8,m1,ta,ma vle8.v v0,0(a0) sub a1,a1,a5 add a0,a0,a5 vmseq.vv v0,v0,v1 vsetvli zero,zero,e64,m8,tu,mu vadd.vi v8,v8,1,v0.t bne a1,zero,.L3 vsetvli a5,zero,e64,m8,ta,ma li a4,0 vmv.s.x v1,a4 vredsum.vs v8,v8,v1 vmv.x.s a0,v8 ret .L4: li a0,0 ret GCC picks LMUL = 8, since it doesn't cause additional register spillings according to the program register pressure.