[llvm-bugs] [Bug 136374] Arm Neoverse scheduling models have a way to large decode bandwidth (about 2x of the actual)

LLVM Bugs via llvm-bugs Fri, 18 Apr 2025 14:52:56 -0700

Issue	136374
Summary	Arm Neoverse scheduling models have a way to large decode bandwidth (about 2x of the actual)
Labels	new issue
Assignees
Reporter	camel-cdr

    I noticed, that the Arm Neoverse scheduling models have a way to large decoding bandwidth: https://godbolt.org/z/54hPqeqdK


I tested how many independent adds llvm-mca thinks the cores can decode per cycle and compared it with the actual decode with:

* CPU: llvm-mca vs Arm-Software-Optimization-Guide "4.1 Dispatch constraints"
* Neoverse-V1: 15 vs 8
* Neoverse-V2: 16 vs 8
* Neoverse-V3: 16 vs 10
* Neoverse-N1: 8 vs 4
* Neoverse-N2: 10 vs 5
* Neoverse-N3: 10 vs 5

The decode/issue width currently used in the scheduling models seems to correspond to the number of uops that can be processed, not MOPs, that are decoded or read from opcache.
Still, unless the cores are capable of fusing independent additions, they shouldn't be able to decode the instructions this quickly.

Here is a code snippet where the additional decode capabilities cause an impossible result: https://godbolt.org/z/GbGrKWxsq
Here the V1 can execute a loop with 13 instructions with 13 IPC, even though it should only be able to decode up to 8 instructions per cycle.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 136374] Arm Neoverse scheduling models have a way to large decode bandwidth (about 2x of the actual)

Reply via email to