Issue |
90416
|
Summary |
[mlir][Aarch64] Improve i8mm instruction sequence for `vector.contract`
|
Labels |
mlir,
mlir:neon
|
Assignees |
KoolJBlack,
banach-space
|
Reporter |
dcaballe
|
The i8mm lowering for some `vector.contract` ops is currently functionally correct. However, performance wise there is some room for improvement. Looking at the generated asm for an mmt4d with 2x2x8 innermost tile sizes, we get:
```
1470: 6e180483 mov v3.d[1], v4.d[0]
1474: 4e006204 tbl v4.16b, { v16.16b, v17.16b, v18.16b, v19.16b }, v0.16b
1478: 4e84a462 smmla v2.4s, v3.16b, v4.16b
147c: 6e024041 ext v1.16b, v2.16b, v2.16b, #0x8
```
It calls my attention the `mov` instruction, esp. the indexing from `1` to `0`, the `tbl` and the `ext` instructions. This may not seem a big deal but the problem is really exacerbated when using larger tile sizes. We observed large sequences of `mov` and `ext` instructions all over the place.
We should investigate what is going on and try to fix the problem. My suspicion is that this [zero initialization and insertion](https://github.com/llvm/llvm-project/blob/aafed3408e7269c42f974189198a47eb6dd2fc84/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractionToSMMLAPattern.cpp#L178-L185) for `vecmat` cases might be behind some of these instructions. We should try if using `llvm.undef` fixes part of the problem.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs