Issue |
139448
|
Summary |
Forcing Clang (via LLVM) to Maximize AVX512 (zmm) Register Usage
|
Labels |
clang
|
Assignees |
|
Reporter |
TheBlackPlague
|
I've a case where if a function is inlined into another function, the LLVM-generated assembly will use all SIMD registers on AVX512, from `zmm0` to `zmm31`. In another case, where it's not inlined, it will only use `zmm0` to `zmm4`.
The issue is that, due to the way LLVM's generated assembly works, when all registers are available, it automatically splits some code into a more cache-friendly version. This is amazing and halves the time taken by the function.
However, it emits the cache-unfriendly version when it doesn't use all the registers.
I've tried reproducing the cache-friendly version using arrays of `__m512i`, but to no avail; it is always the cache-unfriendly version. I'm wondering if there's a way to force Clang to use all `zmm` registers (so that it can see a better assembly can be generated).
I'm using LLVM 20.1.3 via `clang++` and compiling with `-O3 -DNDEBUG`.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs