https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809

            Bug ID: 114809
           Summary: [RISC-V RVV] Counting elements might be simpler
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wojciech_mula at poczta dot onet.pl
  Target Milestone: ---

Consider this simple procedure

---
#include <cstdint>
#include <cstdlib>

size_t count_chars(const char *src, size_t len, char c) {
    size_t count = 0;
    for (size_t i=0; i < len; i++) {
        count += src[i] == c;
    }

    return count;
}
---

Assembly for it (GCC 14.0, -march=rv64gcv -O3):

---
count_chars(char const*, unsigned long, char):
        beq     a1,zero,.L4
        vsetvli a4,zero,e8,mf8,ta,ma
        vmv.v.x v2,a2
        vsetvli zero,zero,e64,m1,ta,ma
        vmv.v.i v1,0
.L3:
        vsetvli a5,a1,e8,mf8,ta,ma
        vle8.v  v0,0(a0)
        sub     a1,a1,a5
        add     a0,a0,a5
        vmseq.vv        v0,v0,v2
        vsetvli zero,zero,e64,m1,tu,mu
        vadd.vi v1,v1,1,v0.t
        bne     a1,zero,.L3
        vsetvli a5,zero,e64,m1,ta,ma
        li      a4,0
        vmv.s.x v2,a4
        vredsum.vs      v1,v1,v2
        vmv.x.s a0,v1
        ret
.L4:
        li      a0,0
        ret
---

The counting procedure might use `vcpop.m` instead of updating vector of
counters (`v1`) and summing them in the end. This would move all mode switches
outside the loop.

And there's a missing peephole optimization:

        li      a4,0
        vmv.s.x v2,a4

It can be:

        vmv.s.x v2,zero

Reply via email to