Issue 130002
Summary [mlir] Inconsistent output when executing MLIR program with `linalg-specialize-generic-ops`
Labels mlir
Assignees
Reporter Emilyaxe
    git version: 953838d

system: `Ubuntu 18.04.6 LTS`

## Description:
I am experiencing an inconsistent result when executing the same MLIR program with and without `linalg-specialize-generic-ops`.


## Steps to Reproduce:

### 1. **MLIR Program (a.mlir)**:
a.mlir: 
``` 
module {
 func.func private @printMemrefI32(tensor<*xi32>)
  func.func private @printMemrefF32(tensor<*xf32>)
  func.func @main() -> () {
    %arg0 = index.constant 0
    %6 = "tosa.const"() <{value = dense<-132> : tensor<1x2x1xi32>}> : () -> tensor<1x2x1xi32>
    %11 = tosa.cast %6 : (tensor<1x2x1xi32>) -> tensor<1x2x1xf32>
    %15 = "tosa.const"() <{value = dense<0> : tensor<1x2x1xi32>}> : () -> tensor<1x2x1xi32>
    %16 = tosa.while_loop (%arg1 = %15) : (tensor<1x2x1xi32>) -> tensor<1x2x1xi32> {
 %40 = "tosa.const"() <{value = dense<3> : tensor<1x2x1xi32>}> : () -> tensor<1x2x1xi32>
      %41 = tosa.greater %40, %arg1 : (tensor<1x2x1xi32>, tensor<1x2x1xi32>) -> tensor<1x2x1xi1>
      %extracted = tensor.extract %41[%arg0, %arg0, %arg0] : tensor<1x2x1xi1>
      %from_elements = tensor.from_elements %extracted : tensor<i1>
      tosa.yield %from_elements : tensor<i1>
    } do {
    ^bb0(%arg1: tensor<1x2x1xi32>):
      %40 = tosa.sin %11 : (tensor<1x2x1xf32>) -> tensor<1x2x1xf32>
      %41 = tosa.slice %11 {size = array<i64: 1, 5, 6>, start = array<i64: 0, 9, 25>} : (tensor<1x2x1xf32>) -> tensor<1x5x6xf32>
      %42 = tosa.erf %11 : (tensor<1x2x1xf32>) -> tensor<1x2x1xf32>
      %43 = tosa.reverse %11 {axis = 0 : i32} : (tensor<1x2x1xf32>) -> tensor<1x2x1xf32>
      %44 = tosa.greater_equal %41, %41 : (tensor<1x5x6xf32>, tensor<1x5x6xf32>) -> tensor<1x5x6xi1>
      %45 = "tosa.const"() <{value = dense<1> : tensor<1x2x1xi32>}> : () -> tensor<1x2x1xi32>
      %46 = tosa.add %arg1, %45 : (tensor<1x2x1xi32>, tensor<1x2x1xi32>) -> tensor<1x2x1xi32>
 tosa.yield %46 : tensor<1x2x1xi32>
    }
    %17 = tosa.clamp %16 {max_fp = 1.07374182E+9 : f32, max_int = 1073741823 : i64, min_fp = -1.07374182E+9 : f32, min_int = -1073741824 : i64} : (tensor<1x2x1xi32>) -> tensor<1x2x1xi32>
    %18 = tosa.clamp %16 {max_fp = 1.07374182E+9 : f32, max_int = 1073741823 : i64, min_fp = -1.07374182E+9 : f32, min_int = -1073741824 : i64} : (tensor<1x2x1xi32>) -> tensor<1x2x1xi32>
    %19 = tosa.sub %17, %18 : (tensor<1x2x1xi32>, tensor<1x2x1xi32>) -> tensor<1x2x1xi32>
    %cast19 = tensor.cast %19 : tensor<1x2x1xi32> to tensor<*xi32>
    call @printMemrefI32(%cast19) : (tensor<*xi32>) -> ()
 return 
  }

}

``` 


 ### 2. **Command to Run without `linalg-specialize-generic-ops` :**

``` 
/data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt /data/szy/workspace/mlir-inconsistent/a.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named))" \
| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -tosa-to-scf  \
| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -pass-pipeline="builtin.module(func.func(tosa-to-linalg))" \
| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -tosa-to-tensor -tosa-to-arith  -convert-scf-to-cf  -convert-math-to-llvm \
--linalg-fuse-elementwise-ops --cse --linalg-generalize-named-ops -convert-arith-to-llvm \
-one-shot-bufferize="bufferize-function-boundaries" --linalg-fold-unit-extent-dims   -finalize-memref-to-llvm --expand-strided-metadata  \
-convert-linalg-to-affine-loops -convert-cf-to-llvm  -convert-index-to-llvm  -finalize-memref-to-llvm -lower-affine  -convert-scf-to-cf   \
-convert-arith-to-llvm -finalize-memref-to-llvm   -convert-func-to-llvm  -reconcile-unrealized-casts \
| timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so

``` 

### 3. **Output  without   `linalg-specialize-generic-ops` :**:

``` 
[[[0],
  [0]]]

``` 

### 4. **Command to Run with `linalg-specialize-generic-ops`  :**


``` 
/data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt /data/szy/workspace/mlir-inconsistent/a.mlir -pass-/data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt /data/szy/workspace/mlir-inconsistent/a.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named))" \
| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -tosa-to-scf  \
| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -pass-pipeline="builtin.module(func.func(tosa-to-linalg))" \
| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -tosa-to-tensor -tosa-to-arith  -convert-scf-to-cf  -convert-math-to-llvm \
--linalg-fuse-elementwise-ops --cse --linalg-generalize-named-ops -convert-arith-to-llvm \
-one-shot-bufferize="bufferize-function-boundaries" --linalg-fold-unit-extent-dims   -finalize-memref-to-llvm --expand-strided-metadata  \
--linalg-specialize-generic-ops -convert-linalg-to-affine-loops  -convert-cf-to-llvm  -convert-index-to-llvm -finalize-memref-to-llvm  -lower-affine  -convert-scf-to-cf \
-convert-arith-to-llvm  -finalize-memref-to-llvm   -convert-func-to-llvm -reconcile-unrealized-casts \
| timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so

``` 

### 5. **Output with  `linalg-specialize-generic-ops` :**

``` 
[[[3],
  [3]]]
``` 

### 6. **Analysis for this case :**

This MLIR program is expected to correctly output` [0, 0]` for `%19 = tosa.sub %17, %18`, given that `%17 `and `%18` are both equal to `%16`. However, instead of the expected result, it incorrectly outputs `[3, 3]`, which is the value of `%16.`
To debug this issue, I printed the IR after each pass and found that the input IR [input.txt](https://github.com/user-attachments/files/19103618/input.txt) is correct before applying the `--linalg-specialize-generic-ops `pass. As shown in the first image,` %reinterpret_cast_24` (the final result) is stored the value of ` %9`, which is a constant with the value 0. However, after running` --linalg-specialize-generic-ops` [output.txt](https://github.com/user-attachments/files/19103628/output.txt) , in the second image  the` linalg.generic` operation is mistakenly optimized into` linalg.copy`, propagating the value 3 from` %reinterpret_cast_19` to `%reinterpret_cast_24`, ultimately leading to the incorrect final result.

![Image](https://github.com/user-attachments/assets/a0a3d514-e7e5-4f6f-afd6-c676b518975e)

![Image](https://github.com/user-attachments/assets/3610a0df-fd35-4fdf-823c-0f253375aa37)


_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to