Issue |
154180
|
Summary |
[X86][Sched][Atom] Correct InstRW mapping for CVTSI642SSrm to 8c/4uops group
|
Labels |
|
Assignees |
|
Reporter |
ms178
|
**Disclaimer:** I am enthusiast user with absolutely no programming background but with a long interest in compilers and other open source projects and recently began tinkering with some parts of the LLVM code base with AI. I found some things that might be of interest to the LLVM community. From the "[Contributing to LLVM](https://llvm.org/docs/Contributing.html)" guidline, I take it that bug reports/issues are welcome, even when AI did the heavy lifting (at least there is no explicit AI exclusion mentioned there). As I ran into issues with some LLVM developers in the recent past who were strictly opposed to being confronted with AI usage, I will make it clear to stop filing such issues if the LLVM community has no interest to be confronted with these.
I am filing my findings as "Issues" and not as MR's for now as someone with actual programming skills might be better suited to bring these over the finish line in code review.
**AI tool used:** GPT5-High
**Patch:**
```
>From 22e142293065bf7313e2e4694bca59c098601073 Mon Sep 17 00:00:00 2001
From: ms178 <m.seyfa...@gmail.com>
Date: Sun, 17 Aug 2025 16:58:17 +0200
Subject: [PATCH] [X86][Sched][Atom] Correct InstRW mapping for CVTSI642SSrm to
8c/4uops group
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Atom in-order scheduling model mis-mapped the memory-source form of
CVTSI642SS (CVTSI642SSrm and CVTSI642SSrm_Int) to the 7-cycle group
(AtomWrite0_1_7_4), which is used for the register-source form (…rr).
The file already defines the correct 8-cycle/4-uop group
(AtomWrite0_1_8_4) for the rm variant, but it wasn’t being used.
This change fixes the InstRW mapping:
- Before: CVTSI642SSrm(_Int) -> AtomWrite0_1_7_4 (7 cycles, 4 uops)
- After: CVTSI642SSrm(_Int) -> AtomWrite0_1_8_4 (8 cycles, 4 uops)
While at it, also add the missing white space formatting to match the surrounding area in the file.
Rationale
- Memory-source conversions are higher latency than register-source on
Atom-class cores; the file’s own SchedWriteRes blocks encode this
distinction. Mapping the rm form to the rr group underestimated the
true latency and led to overly optimistic scheduling on in-order Atom
models.
- This aligns with the pattern used throughout X86ScheduleAtom.td for
other rm vs. rr variants, and with expectations from Agner Fog/uops.info
for Atom-like pipelines where folded-load variants cost an extra cycle.
Impact
- Improves scheduler accuracy for tight loops that perform int64->float
conversions from memory on Atom targets, avoiding back-to-back dependent
issues that can cause pipeline stalls. On in-order cores, this yields a
small but measurable improvement in steady-state IPC in such loops.
- No functional or ABI changes; this only adjusts scheduling metadata.
Testing
- builds the Linux Kernel and Mesa just fine, saw a small consistent performance improvement vs. the baseline with this and some other patches that I'll post soon
Signed-off: Marcus Seyfarth <m.seyfa...@gmail.com>
---
llvm/lib/Target/X86/X86ScheduleAtom.td | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/llvm/lib/Target/X86/X86ScheduleAtom.td b/llvm/lib/Target/X86/X86ScheduleAtom.td
index c92bc97cfb385..e56ca0df30d96 100644
--- a/llvm/lib/Target/X86/X86ScheduleAtom.td
+++ b/llvm/lib/Target/X86/X86ScheduleAtom.td
@@ -553,34 +553,34 @@ def : InstRW<[AtomWrite0_1_5], (instregex "ILD_F(16|32|64)")>;
def AtomWrite0_1_7 : SchedWriteRes<[AtomPort0,AtomPort1]> {
let Latency = 7;
- let ReleaseAtCycles = [6,6];
+ let ReleaseAtCycles = [6, 6];
}
-def : InstRW<[AtomWrite0_1_7], (instregex "CVTSI642SDrm(_Int)?")>;
def AtomWrite0_1_7_4 : SchedWriteRes<[AtomPort0,AtomPort1]> {
let Latency = 7;
- let ReleaseAtCycles = [8,8];
+ let ReleaseAtCycles = [8, 8];
let NumMicroOps = 4;
}
def : InstRW<[AtomWrite0_1_7_4], (instregex "CVTSI642SSrr(_Int)?")>;
def AtomWrite0_1_8_4 : SchedWriteRes<[AtomPort0,AtomPort1]> {
let Latency = 8;
- let ReleaseAtCycles = [8,8];
+ let ReleaseAtCycles = [8, 8];
let NumMicroOps = 4;
}
-def : InstRW<[AtomWrite0_1_7_4], (instregex "CVTSI642SSrm(_Int)?")>;
+
+def : InstRW<[AtomWrite0_1_8_4], (instregex "CVTSI642SSrm(_Int)?")>;
def AtomWrite0_1_9 : SchedWriteRes<[AtomPort0,AtomPort1]> {
let Latency = 9;
- let ReleaseAtCycles = [9,9];
+ let ReleaseAtCycles = [9, 9];
let NumMicroOps = 4;
}
def : InstRW<[AtomWrite0_1_9], (instregex "CVT(T)?SS2SI64rr(_Int)?")>;
def AtomWrite0_1_10 : SchedWriteRes<[AtomPort0,AtomPort1]> {
let Latency = 10;
- let ReleaseAtCycles = [11,11];
+ let ReleaseAtCycles = [11, 11];
let NumMicroOps = 5;
}
def : InstRW<[AtomWrite0_1_10], (instregex "CVT(T)?SS2SI64rm(_Int)?")>;
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs