Issue |
119999
|
Summary |
[mlir] Inconsistent output when executing MLIR program with `affine-parallelize` and `--affine-super-vectorize`
|
Labels |
mlir
|
Assignees |
|
Reporter |
Emilyaxe
|
git version: ff939b06a5
system: `Ubuntu 18.04.6 LTS`
## Description:
I am experiencing an inconsistent result when executing the same MLIR program with and without `affine-parallelize` and `--affine-super-vectorize`.
The output becomes correct when either of these two options is removed, so I'm unsure which optimization contains the bug.
## Steps to Reproduce:
### 1. **MLIR Program (tosa.mlir)**:
tosa.mlir:
```
module {
func.func private @printMemrefI32(tensor<*xi32>)
func.func private @printMemrefF32(tensor<*xf32>)
func.func @main() {
%0 = "tosa.const"() <{value = dense<[0, 2, 1]> : tensor<3xi32>}> : () -> tensor<3xi32>
%1 = "tosa.const"() <{value = dense<-12> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32>
%2 = "tosa.const"() <{value = dense<1676> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32>
%3 = "tosa.const"() <{value = dense<-10> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32>
%4 = tosa.abs %2 : (tensor<1x4x21xi32>) -> tensor<1x4x21xi32>
%5 = tosa.clamp %4 {max_fp = 1.600000e+01 : f32, max_int = 16 : i64, min_fp = 0.000000e+00 : f32, min_int = 0 : i64} : (tensor<1x4x21xi32>) -> tensor<1x4x21xi32>
%6 = tosa.arithmetic_right_shift %2, %5 {round = true} : (tensor<1x4x21xi32>, tensor<1x4x21xi32>) -> tensor<1x4x21xi32>
%7 = tosa.minimum %6, %1 : (tensor<1x4x21xi32>, tensor<1x4x21xi32>) -> tensor<1x4x21xi32>
%8 = tosa.transpose %3, %0 : (tensor<1x4x21xi32>, tensor<3xi32>) -> tensor<1x21x4xi32>
%9 = tosa.matmul %7, %8 : (tensor<1x4x21xi32>, tensor<1x21x4xi32>) -> tensor<1x4x4xi32>
%cast = tensor.cast %9 : tensor<1x4x4xi32> to tensor<*xi32>
call @printMemrefI32(%cast) : (tensor<*xi32>) -> ()
return
}
}
```
### 2. **Command to Run without `affine-parallelize` and `--affine-super-vectorize` :**
```
/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt tosa.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))" | /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt --linalg-generalize-named-ops -tosa-to-arith -convert-math-to-llvm --test-linalg-elementwise-fusion-patterns="fuse-generic-ops-control" -one-shot-bufferize="bufferize-function-boundaries" -convert-arith-to-llvm -convert-linalg-to-affine-loops -convert-vector-to-scf -convert-arith-to-llvm --affine-loop-coalescing -convert-vector-to-scf -convert-vector-to-llvm -convert-math-to-llvm -convert-arith-to-llvm -lower-affine -convert-scf-to-cf -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so
```
### 3. **Output without `affine-parallelize` and `--affine-super-vectorize` :**:
```
[[[2520, 2520, 2520, 2520],
[2520, 2520, 2520, 2520],
[2520, 2520, 2520, 2520],
[2520, 2520, 2520, 2520]]]
```
### 4. **Command to Run with `affine-parallelize` and `--affine-super-vectorize` :**
```
/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt tosa.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))" | /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt --linalg-generalize-named-ops -tosa-to-arith -convert-math-to-llvm --test-linalg-elementwise-fusion-patterns="fuse-generic-ops-control" -one-shot-bufferize="bufferize-function-boundaries" -convert-arith-to-llvm -convert-linalg-to-affine-loops --affine-parallelize -convert-vector-to-scf -convert-arith-to-llvm --affine-loop-coalescing -convert-vector-to-scf --affine-super-vectorize="virtual-vector-size=128 test-fastest-varying=0 vectorize-reductions=true" -convert-vector-to-llvm -convert-math-to-llvm -convert-arith-to-llvm -lower-affine -convert-scf-to-cf -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so
```
### 5. **Output with `affine-parallelize` and `--affine-super-vectorize` :**
```
[[[120, 120, 120, 120],
[120, 120, 120, 120],
[120, 120, 120, 120],
[120, 120, 120, 120]]]
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs