On 10/20/24 1:40 PM, Vineet Gupta wrote:
Background
----------
sched1 runs a preliminary "model schedular" ahead of the main list schedular.
Its sole purpose is to keep register pressure to mimimum [1] and it uses DFA
register depenendency tracking to arrange insns.

    [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2011-12/msg01684.html

                            `The idea was to construct a preliminary
    "model" schedule in which the only objective is to keep register
    pressure to a minimum.  This schedule ignores pipeline characteristics,
    latencies, and the number of available registers.  The maximum pressure
    seen in this initial model schedule (MP) is then the benchmark for ECC(X).`

It starts off with an intial "worklist" of insns w/o any prior dependencies,
scheduling them, adding successors of scheduled insn to the worklist and so
on until all insns in the basic block are done.
It can run into situations where an otherwise to-be-scheduled candidate can't
because it's predecessors haven't been scheduled yet, requiring
"predecessor promotion" implemented in model_promote_predecessors ().
Promotion is essentially bumping INSN model_priority so that it makes it
towards the head of elligible-to-schedule list.

An INSN can have multiple dependencies/predecessor nodes, some of them
being true dependency REG_DEP_TRUE meaning the predecessor register output
is a must have for the INSN to be scheduled.
e.g.
In the sched1 dump below, insn 70 has multiple deps, but 68 and 69 are
true reg deps:

;;      | insn | prio |
;;      |   68 |    3 | r217=r144+r146
;;      |   69 |    5 | r218=flt(r143)
;;      |   70 |    2 | [r217]=r218

;;      insn  code    bb   dep  prio  cost   reservation
;;      ----  ----    --   ---  ----  ----   -----------
;;       70   286     6     6     2     1   alu : FW: 97tnm 91m 83tn 78m 76tn 
72tn
;;                                              : BK: 69t 68t 57n 60n 61n 64n
                                                       ^^^ ^^^

Issue
-----
Currently predecessor promotion bumps the priority of all predecessors
to same value, treating the true deps and the rest alike. This simple
strategy can sometimes cause a subtle inadvertent effect: given the right
"other" conditions (depth, height etc) a non true dependency can get
scheduled ahead of the true dep, increasing the live range between the
true dep and the dependent. This increases the peak register pressure
for the BB. Subsequently this inflated peak register pressure steers
the main list schdular, giving it the lower bound to work with.
Main schedular can make pressure worse (due to instruction latencies
and pipeline models etc) but otherwise it can work with the given
peak pressure. Thus a higher model pressure will ultimately lead to a
less than ideal final schedule.

This subtle effect get crazy exacerbated on RISC-V SPEC2017 Cactu
benchmark.

For the spill2.cpp testcase (reduced from Cactu itself) on RISC-V,
here's what was seen:

   - After idx #6, insn 70 predecessors are promoted (see list above).
     Of the true deps, insn 68 is already schdeuled, insn 69 needs to be.
     insn 69 does get promoted (higher priority 4) but so does insn 60
     (prio 4) with its predecessor insn 58 getting even higher
     promotion (prio 5).

   - The insn 58 and its deps chain end up being scheduled ahead of insn 70
     such that now there are 3 insns seperating it from its direct dep
     insn 69. This blows reg pressure past the spill threshhold of
     sched_class_regs_num[GR_REGS] as evident from the pressure
     summary at the end.

;;      Model schedule:
;;
;;      | idx insn | mpri hght dpth prio |
;;      |   0   56 |    0    8    0   15 | r210=flt(r185#0)     GR_REGS:[25,+0] 
FP_REGS:[0,+1]
;;      |   1   57 |    0    8    1   12 | [r242+low(`e')]=r210 GR_REGS:[25,+0] 
FP_REGS:[1,-1]
;;      |   2   63 |    2    7    0   12 | r215=r141+r183       GR_REGS:[25,+1] 
FP_REGS:[0,+0]
;;      |   3   64 |    1    8    2   11 | r216=[r215]          GR_REGS:[26,-1] 
FP_REGS:[0,+1]
;;      |   4   65 |    0    8    3    8 | r143=fix(r216)       GR_REGS:[25,+1] 
FP_REGS:[1,-1]
;;      |   5   67 |    0    8    4    4 | r146=r143<<0x3       GR_REGS:[26,+1] 
FP_REGS:[0,+0]
;;      |   6   68 |    0    8    5    3 | r217=r144+r146       GR_REGS:[27,+1] 
FP_REGS:[0,+0]

;;      +--- priority of 70 = 3, priority of 69 60 61 = 4

;;      |   7   69 |    4    7    4    5 | r218=flt(r143)        
GR_REGS:[28,+0] FP_REGS:[0,+1]
;;      |   8   58 |    6    4    0    5 | r211=r138+r183        
GR_REGS:[28,+1] FP_REGS:[1,+0]
                                                                          
^^^^^^^
;;      |   9   60 |    5    5    2    4 | r213=[r211]           
GR_REGS:[29,-1] FP_REGS:[1,+1]
;;      |  10   61 |    4    3    0    4 | r214=[r243+low(`o')]  
GR_REGS:[28,+0] FP_REGS:[2,+1]
;;      |  11   70 |    3    8    6    2 | [r217]=r218           
GR_REGS:[28,-1] FP_REGS:[3,-1]
...
...
;; Pressure summary: GR_REGS:29 FP_REGS:3
So at a 30k foot level, does the model schedule need to actually be a correct schedule? If so, then we probably can't do what you're trying to do. We have a memory dependency between insns 60 and 70. If r211 and r217 are potentially the same, then swapping those two instructions is unsafe.

And if it doesn't strictly need to be a valid schedule are we giving an overly-optimistic view of the best that can be done from a pressure standpoint with this change? And if so, is that wise?

I really don't have answers to those questions. But I think they're pretty fundamental to get a handle on before we can discuss implementation details.

jeff


Reply via email to