https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809
Bug ID: 114809 Summary: [RISC-V RVV] Counting elements might be simpler Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- Consider this simple procedure --- #include <cstdint> #include <cstdlib> size_t count_chars(const char *src, size_t len, char c) { size_t count = 0; for (size_t i=0; i < len; i++) { count += src[i] == c; } return count; } --- Assembly for it (GCC 14.0, -march=rv64gcv -O3): --- count_chars(char const*, unsigned long, char): beq a1,zero,.L4 vsetvli a4,zero,e8,mf8,ta,ma vmv.v.x v2,a2 vsetvli zero,zero,e64,m1,ta,ma vmv.v.i v1,0 .L3: vsetvli a5,a1,e8,mf8,ta,ma vle8.v v0,0(a0) sub a1,a1,a5 add a0,a0,a5 vmseq.vv v0,v0,v2 vsetvli zero,zero,e64,m1,tu,mu vadd.vi v1,v1,1,v0.t bne a1,zero,.L3 vsetvli a5,zero,e64,m1,ta,ma li a4,0 vmv.s.x v2,a4 vredsum.vs v1,v1,v2 vmv.x.s a0,v1 ret .L4: li a0,0 ret --- The counting procedure might use `vcpop.m` instead of updating vector of counters (`v1`) and summing them in the end. This would move all mode switches outside the loop. And there's a missing peephole optimization: li a4,0 vmv.s.x v2,a4 It can be: vmv.s.x v2,zero