On 10/20/24 1:40 PM, Vineet Gupta wrote:
Background
----------
sched1 runs a preliminary "model schedular" ahead of the main list schedular.
Its sole purpose is to keep register pressure to mimimum [1] and it uses DFA
register depenendency tracking to arrange insns.
[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2011-12/msg01684.html
`The idea was to construct a preliminary
"model" schedule in which the only objective is to keep register
pressure to a minimum. This schedule ignores pipeline characteristics,
latencies, and the number of available registers. The maximum pressure
seen in this initial model schedule (MP) is then the benchmark for ECC(X).`
It starts off with an intial "worklist" of insns w/o any prior dependencies,
scheduling them, adding successors of scheduled insn to the worklist and so
on until all insns in the basic block are done.
It can run into situations where an otherwise to-be-scheduled candidate can't
because it's predecessors haven't been scheduled yet, requiring
"predecessor promotion" implemented in model_promote_predecessors ().
Promotion is essentially bumping INSN model_priority so that it makes it
towards the head of elligible-to-schedule list.
An INSN can have multiple dependencies/predecessor nodes, some of them
being true dependency REG_DEP_TRUE meaning the predecessor register output
is a must have for the INSN to be scheduled.
e.g.
In the sched1 dump below, insn 70 has multiple deps, but 68 and 69 are
true reg deps:
;; | insn | prio |
;; | 68 | 3 | r217=r144+r146
;; | 69 | 5 | r218=flt(r143)
;; | 70 | 2 | [r217]=r218
;; insn code bb dep prio cost reservation
;; ---- ---- -- --- ---- ---- -----------
;; 70 286 6 6 2 1 alu : FW: 97tnm 91m 83tn 78m 76tn
72tn
;; : BK: 69t 68t 57n 60n 61n 64n
^^^ ^^^
Issue
-----
Currently predecessor promotion bumps the priority of all predecessors
to same value, treating the true deps and the rest alike. This simple
strategy can sometimes cause a subtle inadvertent effect: given the right
"other" conditions (depth, height etc) a non true dependency can get
scheduled ahead of the true dep, increasing the live range between the
true dep and the dependent. This increases the peak register pressure
for the BB. Subsequently this inflated peak register pressure steers
the main list schdular, giving it the lower bound to work with.
Main schedular can make pressure worse (due to instruction latencies
and pipeline models etc) but otherwise it can work with the given
peak pressure. Thus a higher model pressure will ultimately lead to a
less than ideal final schedule.
This subtle effect get crazy exacerbated on RISC-V SPEC2017 Cactu
benchmark.
For the spill2.cpp testcase (reduced from Cactu itself) on RISC-V,
here's what was seen:
- After idx #6, insn 70 predecessors are promoted (see list above).
Of the true deps, insn 68 is already schdeuled, insn 69 needs to be.
insn 69 does get promoted (higher priority 4) but so does insn 60
(prio 4) with its predecessor insn 58 getting even higher
promotion (prio 5).
- The insn 58 and its deps chain end up being scheduled ahead of insn 70
such that now there are 3 insns seperating it from its direct dep
insn 69. This blows reg pressure past the spill threshhold of
sched_class_regs_num[GR_REGS] as evident from the pressure
summary at the end.
;; Model schedule:
;;
;; | idx insn | mpri hght dpth prio |
;; | 0 56 | 0 8 0 15 | r210=flt(r185#0) GR_REGS:[25,+0]
FP_REGS:[0,+1]
;; | 1 57 | 0 8 1 12 | [r242+low(`e')]=r210 GR_REGS:[25,+0]
FP_REGS:[1,-1]
;; | 2 63 | 2 7 0 12 | r215=r141+r183 GR_REGS:[25,+1]
FP_REGS:[0,+0]
;; | 3 64 | 1 8 2 11 | r216=[r215] GR_REGS:[26,-1]
FP_REGS:[0,+1]
;; | 4 65 | 0 8 3 8 | r143=fix(r216) GR_REGS:[25,+1]
FP_REGS:[1,-1]
;; | 5 67 | 0 8 4 4 | r146=r143<<0x3 GR_REGS:[26,+1]
FP_REGS:[0,+0]
;; | 6 68 | 0 8 5 3 | r217=r144+r146 GR_REGS:[27,+1]
FP_REGS:[0,+0]
;; +--- priority of 70 = 3, priority of 69 60 61 = 4
;; | 7 69 | 4 7 4 5 | r218=flt(r143)
GR_REGS:[28,+0] FP_REGS:[0,+1]
;; | 8 58 | 6 4 0 5 | r211=r138+r183
GR_REGS:[28,+1] FP_REGS:[1,+0]
^^^^^^^
;; | 9 60 | 5 5 2 4 | r213=[r211]
GR_REGS:[29,-1] FP_REGS:[1,+1]
;; | 10 61 | 4 3 0 4 | r214=[r243+low(`o')]
GR_REGS:[28,+0] FP_REGS:[2,+1]
;; | 11 70 | 3 8 6 2 | [r217]=r218
GR_REGS:[28,-1] FP_REGS:[3,-1]
...
...
;; Pressure summary: GR_REGS:29 FP_REGS:3
So at a 30k foot level, does the model schedule need to actually be a
correct schedule? If so, then we probably can't do what you're trying
to do. We have a memory dependency between insns 60 and 70. If r211
and r217 are potentially the same, then swapping those two instructions
is unsafe.
And if it doesn't strictly need to be a valid schedule are we giving an
overly-optimistic view of the best that can be done from a pressure
standpoint with this change? And if so, is that wise?
I really don't have answers to those questions. But I think they're
pretty fundamental to get a handle on before we can discuss
implementation details.
jeff