https://bugs.llvm.org/show_bug.cgi?id=49499
Bug ID: 49499
Summary: llvm-mca for cortex-a57 gets thrown off by SIMD loads
with dependencies (negative latency?)
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: AArch64
Assignee: unassignedb...@nondot.org
Reporter: mar...@martin.st
CC: andrea.dibia...@gmail.com,
arnaud.degrandmai...@arm.com,
llvm-bugs@lists.llvm.org, smithp...@googlemail.com,
ties.st...@arm.com
Given the following instruction series, llvm-mca seems to calculate a sensible
result:
$ cat test.S
add v0.16b, v1.16b, v2.16b
add v1.16b, v3.16b, v0.16b
add v2.16b, v3.16b, v1.16b
$ llvm-mca --mtriple=aarch64-linux-gnu --mcpu=cortex-a57 test.S
Iterations: 100
Instructions: 300
Total Cycles: 903
However if the series is preceded by a SIMD load into the registers that are
used, the total cycle count ends up reduced:
$ cat test2.S
ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x0]
add v0.16b, v1.16b, v2.16b
add v1.16b, v3.16b, v0.16b
add v2.16b, v3.16b, v1.16b
$ llvm-mca --mtriple=aarch64-linux-gnu --mcpu=cortex-a57 test2.S
Iterations: 100
Instructions: 400
Total Cycles: 416
Suddenly the total cycles has dropped in half as if the load has negative
latency. If the load is into a different set of registers, it doesn't affect
the calculation in the same way:
$ cat test3.S
ld1 {v16.16b, v17.16b, v18.16b, v19.16b}, [x0]
add v0.16b, v1.16b, v2.16b
add v1.16b, v3.16b, v0.16b
add v2.16b, v3.16b, v1.16b
$ llvm-mca --mtriple=aarch64-linux-gnu --mcpu=cortex-a57 test3.S
Iterations: 100
Instructions: 400
Total Cycles: 904
This doesn't seem to happen for the A55 target though.
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs