Issue |
130002
|
Summary |
[mlir] Inconsistent output when executing MLIR program with `linalg-specialize-generic-ops`
|
Labels |
mlir
|
Assignees |
|
Reporter |
Emilyaxe
|
git version: 953838d
system: `Ubuntu 18.04.6 LTS`
## Description:
I am experiencing an inconsistent result when executing the same MLIR program with and without `linalg-specialize-generic-ops`.
## Steps to Reproduce:
### 1. **MLIR Program (a.mlir)**:
a.mlir:
```
module {
func.func private @printMemrefI32(tensor<*xi32>)
func.func private @printMemrefF32(tensor<*xf32>)
func.func @main() -> () {
%arg0 = index.constant 0
%6 = "tosa.const"() <{value = dense<-132> : tensor<1x2x1xi32>}> : () -> tensor<1x2x1xi32>
%11 = tosa.cast %6 : (tensor<1x2x1xi32>) -> tensor<1x2x1xf32>
%15 = "tosa.const"() <{value = dense<0> : tensor<1x2x1xi32>}> : () -> tensor<1x2x1xi32>
%16 = tosa.while_loop (%arg1 = %15) : (tensor<1x2x1xi32>) -> tensor<1x2x1xi32> {
%40 = "tosa.const"() <{value = dense<3> : tensor<1x2x1xi32>}> : () -> tensor<1x2x1xi32>
%41 = tosa.greater %40, %arg1 : (tensor<1x2x1xi32>, tensor<1x2x1xi32>) -> tensor<1x2x1xi1>
%extracted = tensor.extract %41[%arg0, %arg0, %arg0] : tensor<1x2x1xi1>
%from_elements = tensor.from_elements %extracted : tensor<i1>
tosa.yield %from_elements : tensor<i1>
} do {
^bb0(%arg1: tensor<1x2x1xi32>):
%40 = tosa.sin %11 : (tensor<1x2x1xf32>) -> tensor<1x2x1xf32>
%41 = tosa.slice %11 {size = array<i64: 1, 5, 6>, start = array<i64: 0, 9, 25>} : (tensor<1x2x1xf32>) -> tensor<1x5x6xf32>
%42 = tosa.erf %11 : (tensor<1x2x1xf32>) -> tensor<1x2x1xf32>
%43 = tosa.reverse %11 {axis = 0 : i32} : (tensor<1x2x1xf32>) -> tensor<1x2x1xf32>
%44 = tosa.greater_equal %41, %41 : (tensor<1x5x6xf32>, tensor<1x5x6xf32>) -> tensor<1x5x6xi1>
%45 = "tosa.const"() <{value = dense<1> : tensor<1x2x1xi32>}> : () -> tensor<1x2x1xi32>
%46 = tosa.add %arg1, %45 : (tensor<1x2x1xi32>, tensor<1x2x1xi32>) -> tensor<1x2x1xi32>
tosa.yield %46 : tensor<1x2x1xi32>
}
%17 = tosa.clamp %16 {max_fp = 1.07374182E+9 : f32, max_int = 1073741823 : i64, min_fp = -1.07374182E+9 : f32, min_int = -1073741824 : i64} : (tensor<1x2x1xi32>) -> tensor<1x2x1xi32>
%18 = tosa.clamp %16 {max_fp = 1.07374182E+9 : f32, max_int = 1073741823 : i64, min_fp = -1.07374182E+9 : f32, min_int = -1073741824 : i64} : (tensor<1x2x1xi32>) -> tensor<1x2x1xi32>
%19 = tosa.sub %17, %18 : (tensor<1x2x1xi32>, tensor<1x2x1xi32>) -> tensor<1x2x1xi32>
%cast19 = tensor.cast %19 : tensor<1x2x1xi32> to tensor<*xi32>
call @printMemrefI32(%cast19) : (tensor<*xi32>) -> ()
return
}
}
```
### 2. **Command to Run without `linalg-specialize-generic-ops` :**
```
/data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt /data/szy/workspace/mlir-inconsistent/a.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named))" \
| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -tosa-to-scf \
| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -pass-pipeline="builtin.module(func.func(tosa-to-linalg))" \
| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -tosa-to-tensor -tosa-to-arith -convert-scf-to-cf -convert-math-to-llvm \
--linalg-fuse-elementwise-ops --cse --linalg-generalize-named-ops -convert-arith-to-llvm \
-one-shot-bufferize="bufferize-function-boundaries" --linalg-fold-unit-extent-dims -finalize-memref-to-llvm --expand-strided-metadata \
-convert-linalg-to-affine-loops -convert-cf-to-llvm -convert-index-to-llvm -finalize-memref-to-llvm -lower-affine -convert-scf-to-cf \
-convert-arith-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \
| timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so
```
### 3. **Output without `linalg-specialize-generic-ops` :**:
```
[[[0],
[0]]]
```
### 4. **Command to Run with `linalg-specialize-generic-ops` :**
```
/data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt /data/szy/workspace/mlir-inconsistent/a.mlir -pass-/data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt /data/szy/workspace/mlir-inconsistent/a.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named))" \
| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -tosa-to-scf \
| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -pass-pipeline="builtin.module(func.func(tosa-to-linalg))" \
| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -tosa-to-tensor -tosa-to-arith -convert-scf-to-cf -convert-math-to-llvm \
--linalg-fuse-elementwise-ops --cse --linalg-generalize-named-ops -convert-arith-to-llvm \
-one-shot-bufferize="bufferize-function-boundaries" --linalg-fold-unit-extent-dims -finalize-memref-to-llvm --expand-strided-metadata \
--linalg-specialize-generic-ops -convert-linalg-to-affine-loops -convert-cf-to-llvm -convert-index-to-llvm -finalize-memref-to-llvm -lower-affine -convert-scf-to-cf \
-convert-arith-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \
| timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so
```
### 5. **Output with `linalg-specialize-generic-ops` :**
```
[[[3],
[3]]]
```
### 6. **Analysis for this case :**
This MLIR program is expected to correctly output` [0, 0]` for `%19 = tosa.sub %17, %18`, given that `%17 `and `%18` are both equal to `%16`. However, instead of the expected result, it incorrectly outputs `[3, 3]`, which is the value of `%16.`
To debug this issue, I printed the IR after each pass and found that the input IR [input.txt](https://github.com/user-attachments/files/19103618/input.txt) is correct before applying the `--linalg-specialize-generic-ops `pass. As shown in the first image,` %reinterpret_cast_24` (the final result) is stored the value of ` %9`, which is a constant with the value 0. However, after running` --linalg-specialize-generic-ops` [output.txt](https://github.com/user-attachments/files/19103628/output.txt) , in the second image the` linalg.generic` operation is mistakenly optimized into` linalg.copy`, propagating the value 3 from` %reinterpret_cast_19` to `%reinterpret_cast_24`, ultimately leading to the incorrect final result.


_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs