Issue 90416
Summary [mlir][Aarch64] Improve i8mm instruction sequence for `vector.contract`
Labels mlir, mlir:neon
Assignees KoolJBlack, banach-space
Reporter dcaballe
    The i8mm lowering for some `vector.contract` ops is currently functionally correct. However, performance wise there is some room for improvement. Looking at the generated asm for an mmt4d with 2x2x8 innermost tile sizes, we get:

```
    1470: 6e180483      mov     v3.d[1], v4.d[0] 
    1474: 4e006204      tbl     v4.16b, { v16.16b, v17.16b, v18.16b, v19.16b }, v0.16b 
 1478: 4e84a462      smmla   v2.4s, v3.16b, v4.16b 
    147c: 6e024041      ext     v1.16b, v2.16b, v2.16b, #0x8 
```

It calls my attention the `mov` instruction, esp. the indexing from `1` to `0`, the `tbl` and the `ext` instructions. This may not seem a big deal but the problem is really exacerbated when using larger tile sizes. We observed large sequences of `mov` and `ext` instructions all over the place.

We should investigate what is going on and try to fix the problem. My suspicion is that this [zero initialization and insertion](https://github.com/llvm/llvm-project/blob/aafed3408e7269c42f974189198a47eb6dd2fc84/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractionToSMMLAPattern.cpp#L178-L185) for `vecmat` cases might be behind some of these instructions. We should try if using `llvm.undef` fixes part of the problem.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to