[llvm-bugs] [Bug 139448] Forcing Clang (via LLVM) to Maximize AVX512 (zmm) Register Usage

LLVM Bugs via llvm-bugs Sun, 11 May 2025 06:02:21 -0700

Issue	139448
Summary	Forcing Clang (via LLVM) to Maximize AVX512 (zmm) Register Usage
Labels	clang
Assignees
Reporter	TheBlackPlague

    I've a case where if a function is inlined into another function, the LLVM-generated assembly will use all SIMD registers on AVX512, from `zmm0` to `zmm31`. In another case, where it's not inlined, it will only use `zmm0` to `zmm4`.


The issue is that, due to the way LLVM's generated assembly works, when all registers are available, it automatically splits some code into a more cache-friendly version. This is amazing and halves the time taken by the function.

However, it emits the cache-unfriendly version when it doesn't use all the registers.

I've tried reproducing the cache-friendly version using arrays of `__m512i`, but to no avail; it is always the cache-unfriendly version. I'm wondering if there's a way to force Clang to use all `zmm` registers (so that it can see a better assembly can be generated). 

I'm using LLVM 20.1.3 via `clang++` and compiling with `-O3 -DNDEBUG`.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 139448] Forcing Clang (via LLVM) to Maximize AVX512 (zmm) Register Usage

Reply via email to