https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114814
--- Comment #3 from Feng Xue <fxue at os dot amperecomputing.com> ---
The pattern to match the code belongs to a generic dot-product category, we
could consider mapping it to native dot-product instruction with a constant "1"
operand.
movi v29.16b, 0x1
.L4:
ldr q31, [x1], 16
cmeq v31.16b, v28.16b, v31.16b
and v31.16b, v29.16b, v31.16b
udot v30.4s, v31.16b, v29.16b
cmp x5, x1
bne .L4
addv s31, v30.4s
fmov w1, s31
And if value accumulation does not require widening, as in this case, then
REDUC_PLUS finds its usage, which could be seen as a special instance of
dot-product instruction. But here is one point to note: we should think this
kind of REDUC_PLUS touches whole vector register, modifying the 1st element and
clearing the rest part. Anyway, it would become an addv.
For SVE, since element count is variant, element type may not hold accumulation
result, only dot-product could be used.
Moreover, it is possible to extend the means to handle conditional accumulation
as:
for (i) {
if (cond)
sum += a; // => sum += cond * a;
}