Issue 90025
Summary Running LLVM IR causes segfault on macOS aarch64 but not on x86_64
Labels new issue
Assignees
Reporter hameerabbasi
    TL;DR: Identical LLVM IR produced on both platforms, only one platform consistently crashes. Seems like an ABI thing as passing values in and adding print statements shows.

Hello, I have a version of the LLVM toolchain at commit `c2a98fdeb3aede1a8db492a6ea30f4fa85b60edc` compiled on both an aarch64 and x86_64 Mac with the following commands:

```bash
cmake -G Ninja ../llvm \
   -DLLVM_ENABLE_PROJECTS="mlir;llvm;clang;lld" \
   -DLLVM_BUILD_EXAMPLES=ON \
   -DLLVM_TARGETS_TO_BUILD="Native" \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_ASSERTIONS=ON
ninja
```

## What I've found so far
I'm attaching the full reproducer here: [reproducer.zip](https://github.com/llvm/llvm-project/files/15105804/reproducer.zip)

After running `./compile.sh`, `sddmm_opt.mlir` has the following function signature (reorganized for style):

```ll
llvm.func @_mlir_ciface_sddmm_kernel(
    %arg0: !llvm.ptr, // to output struct holding memref<?x?xi64>, memref<?x?xf64>, memref<?x?xf64>, type of %arg6
    %arg1: !llvm.ptr, // to memref<?x?xf64>
    %arg2: !llvm.ptr, // to memref<?x?xf64>
    %arg3: !llvm.ptr, // to memref<?xi64>
    %arg4: !llvm.ptr, // to memref<?xi64>
    %arg5: !llvm.ptr, // to memref<?xf64>
    %arg6: !llvm.struct<(array<2 x i64>, array<3 x i64>)>) attributes {llvm.emit_c_interface} {
  ...
}
```

By reading the generated function definition after adding some sparse_tensor.print ops to the original file and linking with the right `*.o` files, I found that on aarch64, the last argument, `%arg6` isn’t getting passed in properly, it seems to have some garbage values. Since this contains some loop endpoints, it, of course, segfaults.

Changing to a simpler kernel that just prints and returns a `%args` in `sddmm.mlir` reveals that the last field contained in `%arg0` with the same type as `%arg6` reveals that returning isn’t an issue – It copies the right values over, so passing the struct into MLIR is somehow borked.

So, looking at the part of the code defining the input struct corresponding to `%arg6`, we see:

```c++
struct Shapes {
  intptr_t sizes[2];
  intptr_t lengths[3];
};
```

which contains no pointers or “shared” data, so that shouldn’t be an issue either.

Digging deeper, it seems the `sddmm_opt.ll` file generated is identical across both machines, and contains nothing platform-specific other than the alignment (8 bytes for both systems). Here is the LLVM generated function signature:

```ll
define void @_mlir_ciface_sddmm_kernel(ptr %0, ptr %1, ptr %2, ptr %3, ptr %4, ptr %5, { [2 x i64], [3 x i64] } %6) {
  ...
}
```

## To run the repoducer
To reproduce, out of the full set of files (included for posterity), you only need three files besides your LLVM build:
1. `compile.sh` 
2. `sddmm.mlir`
3. Either `sddmm_test.cpp` or `sddmm_test.py`

Just edit the paths (line 6 of `compile.sh` and line 121 of `sddmm_test.py`) and run either `./a.out` or `python sddmm_test.py`.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to