Issue 141768
Summary [LV] Maximum VF does not consider scaled reductions
Labels new issue
Assignees
Reporter preames
    Reproducer: https://godbolt.org/z/4xf7c8GMM


It looks like the vectorizer has not yet been updated to consider scaled reductions (a.k.a. multiply-accumulate with extended operands) in the VF selection logic.  In this case, if my tracing through the debug output is correct, we consider the widest type in the loop to be an i32 and select a maximum VF to cost based on that.  This results in a loop which is running at 1/4 of the width it should be.  It's still more profitable than not using the zvqdotq (scaled reduction) lowering, but also isn't ideal.  

int doti32_i8_sext(char *a, char *b, int N) {
 int sum = 0;
  for (int i = 0; i < N; i++) {
    int a32 = a[i];
    int b32 = b[i];
    sum += a32 * b32;
  }
  return sum;
}

// -O3 -x c++ -march=rv64gcv_zvqdotq0p0 -menable-experimental-extensions
.LBB0_5:
 vsetvli a5, zero, e8, mf2, ta, ma
        vle8.v  v9, (a3)
        vle8.v v10, (a4)
        add     a4, a4, t0
        vsetvli a5, zero, e32, mf2, ta, ma
        vqdotu.vv       v8, v10, v9
        add     a3, a3, t0
 bne     a4, a7, .LBB0_5
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to