https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114440
Bug ID: 114440
Summary: Fail to recognize a chain of lane-reduced operations
for loop reduction vect
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: fxue at os dot amperecomputing.com
Target Milestone: ---
In a loop reduction path containing a lane-reduced operation
(DOT_PROD/SAD/WIDEN_SUM), current vectorizer could not handle the pattern if
there are other operations, which might be a normal or another lane-reduced
one. A pseudo example is represented as:
char *d0, *d1;
char *s0, *s1;
char *w;
int *n;
...
int sum = 0;
for (i) {
...
sum += d0[i] * d1[i]; /* DOT_PROD */
...
sum += abs(s0[i] - s1[i]); /* SAD */
...
sum += w[i]; /* WIDEN_SUM */
...
sum += n[i]; /* Normal */
...
}
... = sum;
For the case, reduction vectype would vary with operations, and this causes
mismatch on count of vectorized defs and uses, a possible means might be fixing
that by generating extra trivial pass-through copies. Given a concrete example
as:
sum = 0;
for (i) {
sum += d0[i] * d1[i]; /* 16*char -> 4*int */
sum += n[i]; /* 4*int -> 4*int */
}
Final vetorized statements could be:
sum_v0 = { 0, 0, 0, 0 };
sum_v1 = { 0, 0, 0, 0 };
sum_v2 = { 0, 0, 0, 0 };
sum_v3 = { 0, 0, 0, 0 };
for (i / 16) {
sum_v0 += DOT_PROD (v_d0[i: 0 .. 15], v_d1[i: 0 .. 15]);
sum_v1 += 0; // copy
sum_v2 += 0; // copy
sum_v3 += 0; // copy
sum_v0 += v_n[i: 0 .. 3];
sum_v1 += v_n[i: 4 .. 7];
sum_v2 += v_n[i: 8 .. 11];
sum_v3 += v_n[i: 12 .. 15];
}
sum = REDUC_PLUS(sum_v0 + sum_v1 + sum_v2 + sum_v3);
In the above sequence, one summation statement simply forms one pattern.
Though, we could easily compose a somewhat more complicated variant that gets
into the similar situation. That is, a chain of lane-reduced operations comes
from the non-reduction addend in one summation statement, like:
sum += d0[i] * d1[i] + abs(s0[i] - s1[i]) + n[i];
Probably, this requires some extension in the vector pattern formation stage to
split the patterns.