[llvm-bugs] [Bug 138595] [MLIR] convert-linalg-to-parallel-loops doesn't seem to parallelize reductions

LLVM Bugs via llvm-bugs Mon, 05 May 2025 14:43:33 -0700

Issue	138595
Summary	[MLIR] convert-linalg-to-parallel-loops doesn't seem to parallelize reductions
Labels
Assignees
Reporter	brian-kelley

    I have a project that uses MLIR to lower linalg/tensor code to scf/memref. Ideally the resulting code exposes all possible parallelism from the input operations. One of the passes I'm using is ``--convert-linalg-to-parallel-loops``, but it falls back to a sequential ``scf.for`` loop for reduction dimensions, instead of ``scf.parallel/scf.reduce``.


Below are a couple of simple examples (dot and matmul) that turn the reduction dimension into ``scf.for`` loops. To replicate, run them through ``mlir-opt --convert-linalg-to-parallel-loops``.

Is there another way to generate parallel loops from linalg ops that will keep reductions parallel? One alternative I tried is ``--convert-linalg-to-affine-loops --affine-parallelize``, but this also parallelizes over the non-reduction dimensions only (and it seems like the affine dialect can't express a parallel reduction anyway).

dot.mlir:
```
module {
func.func @dot(%arg0: memref<?xf64>, %arg1: memref<?xf64>) -> f64 {
  %alloc = memref.alloc() {alignment = 64 : i64} : memref<f64>
  linalg.dot ins(%arg0, %arg1 : memref<?xf64>, memref<?xf64>) outs(%alloc : memref<f64>)
  %0 = memref.load %alloc[] : memref<f64>
  return %0 : f64
}
}
```
result:
```
module {
  func.func @dot(%arg0: memref<?xf64>, %arg1: memref<?xf64>) -> f64 {
    %c1 = arith.constant 1 : index
    %c0 = arith.constant 0 : index
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<f64>
    %dim = memref.dim %arg0, %c0 : memref<?xf64>
    scf.for %arg2 = %c0 to %dim step %c1 {
      %1 = memref.load %arg0[%arg2] : memref<?xf64>
      %2 = memref.load %arg1[%arg2] : memref<?xf64>
      %3 = memref.load %alloc[] : memref<f64>
      %4 = arith.mulf %1, %2 : f64
      %5 = arith.addf %3, %4 : f64
 memref.store %5, %alloc[] : memref<f64>
    }
    %0 = memref.load %alloc[] : memref<f64>
    return %0 : f64
  }
}
```

matmul.mlir:
```
module {
func.func @dot(%arg0: memref<64x64xf64>, %arg1: memref<64x64xf64>) -> memref<64x64xf64> {
  %alloc = memref.alloc() {alignment = 64 : i64} : memref<64x64xf64>
  linalg.matmul ins(%arg0, %arg1 : memref<64x64xf64>, memref<64x64xf64>) outs(%alloc : memref<64x64xf64>)
  return %alloc : memref<64x64xf64>
}
}
```
result:
```
module {
  func.func @dot(%arg0: memref<64x64xf64>, %arg1: memref<64x64xf64>) -> memref<64x64xf64> {
    %c1 = arith.constant 1 : index
    %c64 = arith.constant 64 : index
    %c0 = arith.constant 0 : index
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<64x64xf64>
    scf.parallel (%arg2, %arg3) = (%c0, %c0) to (%c64, %c64) step (%c1, %c1) {
      scf.for %arg4 = %c0 to %c64 step %c1 {
 %0 = memref.load %arg0[%arg2, %arg4] : memref<64x64xf64>
        %1 = memref.load %arg1[%arg4, %arg3] : memref<64x64xf64>
        %2 = memref.load %alloc[%arg2, %arg3] : memref<64x64xf64>
        %3 = arith.mulf %0, %1 : f64
        %4 = arith.addf %2, %3 : f64
        memref.store %4, %alloc[%arg2, %arg3] : memref<64x64xf64>
      }
      scf.reduce 
    }
 return %alloc : memref<64x64xf64>
  }
}
```

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 138595] [MLIR] convert-linalg-to-parallel-loops doesn't seem to parallelize reductions

Reply via email to