On 10/30/24 18:11, Jeff Law wrote:
> On 10/20/24 1:40 PM, Vineet Gupta wrote:
>> Background
>> ----------
>> sched1 runs a preliminary "model schedular" ahead of the main list schedular.
>> Its sole purpose is to keep register pressure to mimimum [1] and it uses DFA
>> register depenendency tracking to arrange insns.
>>
>>     [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2011-12/msg01684.html
>>
>>                             `The idea was to construct a preliminary
>>     "model" schedule in which the only objective is to keep register
>>     pressure to a minimum.  This schedule ignores pipeline characteristics,
>>     latencies, and the number of available registers.  The maximum pressure
>>     seen in this initial model schedule (MP) is then the benchmark for 
>> ECC(X).`
>>
>> It starts off with an intial "worklist" of insns w/o any prior dependencies,
>> scheduling them, adding successors of scheduled insn to the worklist and so
>> on until all insns in the basic block are done.
>> It can run into situations where an otherwise to-be-scheduled candidate can't
>> because it's predecessors haven't been scheduled yet, requiring
>> "predecessor promotion" implemented in model_promote_predecessors ().
>> Promotion is essentially bumping INSN model_priority so that it makes it
>> towards the head of elligible-to-schedule list.
>>
>> An INSN can have multiple dependencies/predecessor nodes, some of them
>> being true dependency REG_DEP_TRUE meaning the predecessor register output
>> is a must have for the INSN to be scheduled.
>> e.g.
>> In the sched1 dump below, insn 70 has multiple deps, but 68 and 69 are
>> true reg deps:
>>
>> ;;   | insn | prio |
>> ;;   |   68 |    3 | r217=r144+r146
>> ;;   |   69 |    5 | r218=flt(r143)
>> ;;   |   70 |    2 | [r217]=r218
>>
>> ;;      insn  code    bb   dep  prio  cost   reservation
>> ;;      ----  ----    --   ---  ----  ----   -----------
>> ;;       70   286     6     6     2     1   alu      : FW: 97tnm 91m 83tn 
>> 78m 76tn 72tn
>> ;;                                           : BK: 69t 68t 57n 60n 61n 64n
>>                                                        ^^^ ^^^

Indeed insn 60 is in backwards dependency of insn 70, although its not a 
true/data dependency.
'n' in dump above implies DEP_NONREG.

>>
>> Issue
>> -----
>> Currently predecessor promotion bumps the priority of all predecessors
>> to same value, treating the true deps and the rest alike. This simple
>> strategy can sometimes cause a subtle inadvertent effect: given the right
>> "other" conditions (depth, height etc) a non true dependency can get
>> scheduled ahead of the true dep, increasing the live range between the
>> true dep and the dependent. This increases the peak register pressure
>> for the BB. Subsequently this inflated peak register pressure steers
>> the main list schdular, giving it the lower bound to work with.
>> Main schedular can make pressure worse (due to instruction latencies
>> and pipeline models etc) but otherwise it can work with the given
>> peak pressure. Thus a higher model pressure will ultimately lead to a
>> less than ideal final schedule.
>>
>> This subtle effect get crazy exacerbated on RISC-V SPEC2017 Cactu
>> benchmark.
>>
>> For the spill2.cpp testcase (reduced from Cactu itself) on RISC-V,
>> here's what was seen:
>>
>>    - After idx #6, insn 70 predecessors are promoted (see list above).
>>      Of the true deps, insn 68 is already schdeuled, insn 69 needs to be.
>>      insn 69 does get promoted (higher priority 4) but so does insn 60
>>      (prio 4) with its predecessor insn 58 getting even higher
>>      promotion (prio 5).
>>
>>    - The insn 58 and its deps chain end up being scheduled ahead of insn 70
>>      such that now there are 3 insns seperating it from its direct dep
>>      insn 69. This blows reg pressure past the spill threshhold of
>>      sched_class_regs_num[GR_REGS] as evident from the pressure
>>      summary at the end.
>>
>> ;;   Model schedule:
>> ;;
>> ;;   | idx insn | mpri hght dpth prio |
>> ;;   |   0   56 |    0    8    0   15 | r210=flt(r185#0)     GR_REGS:[25,+0] 
>> FP_REGS:[0,+1]
>> ;;   |   1   57 |    0    8    1   12 | [r242+low(`e')]=r210 GR_REGS:[25,+0] 
>> FP_REGS:[1,-1]
>> ;;   |   2   63 |    2    7    0   12 | r215=r141+r183       GR_REGS:[25,+1] 
>> FP_REGS:[0,+0]
>> ;;   |   3   64 |    1    8    2   11 | r216=[r215]          GR_REGS:[26,-1] 
>> FP_REGS:[0,+1]
>> ;;   |   4   65 |    0    8    3    8 | r143=fix(r216)       GR_REGS:[25,+1] 
>> FP_REGS:[1,-1]
>> ;;   |   5   67 |    0    8    4    4 | r146=r143<<0x3       GR_REGS:[26,+1] 
>> FP_REGS:[0,+0]
>> ;;   |   6   68 |    0    8    5    3 | r217=r144+r146       GR_REGS:[27,+1] 
>> FP_REGS:[0,+0]
>>
>> ;;   +--- priority of 70 = 3, priority of 69 60 61 = 4
>>
>> ;;   |   7   69 |    4    7    4    5 | r218=flt(r143)        
>> GR_REGS:[28,+0] FP_REGS:[0,+1]
>> ;;   |   8   58 |    6    4    0    5 | r211=r138+r183        
>> GR_REGS:[28,+1] FP_REGS:[1,+0]
>>                                                                           
>> ^^^^^^^
>> ;;   |   9   60 |    5    5    2    4 | r213=[r211]           
>> GR_REGS:[29,-1] FP_REGS:[1,+1]
>> ;;   |  10   61 |    4    3    0    4 | r214=[r243+low(`o')]  
>> GR_REGS:[28,+0] FP_REGS:[2,+1]
>> ;;   |  11   70 |    3    8    6    2 | [r217]=r218           
>> GR_REGS:[28,-1] FP_REGS:[3,-1]
>> ...
>> ...
>> ;; Pressure summary: GR_REGS:29 FP_REGS:3
> So at a 30k foot level, does the model schedule need to actually be a 
> correct schedule?

>From my reading of the code and the background posts pointed to by Richard [1],
it doesn't have to be - its sole purpose is to compute the *minimum* register 
pressure.

   [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2011-12/msg01684.html

|                                                      `The idea was to 
construct a preliminary
| "model" schedule in which the only objective is to keep register
| pressure to a minimum. This schedule ignores pipeline characteristics,
| latencies, and the number of available registers. The maximum pressure
| seen in this initial model schedule (MP) is then the benchmark for ECC(X).`

It creates the initial arrangement of insns but stuff does get moved around by
 main scheduler (seems to happen for this very case too, see below)

> If so, then we probably can't do what you're trying 
> to do.  We have a memory dependency between insns 60 and 70.  If r211 
> and r217 are potentially the same, then swapping those two instructions 
> is unsafe.

Whao ! steely eyes.

So model schedule is  indeed ordering them as 70 -> 60

;;    |   0   56 |    0    8    0   15 | r210=flt(r185#0)                
GR_REGS:[25,+0] FP_REGS:[0,+1]
;;    |   1   57 |    0    8    1   12 | [r242+low(`e')]=r210            
GR_REGS:[25,+0] FP_REGS:[1,-1]
;;    |   2   63 |    2    7    0   12 | r215=r141+r183                  
GR_REGS:[25,+1] FP_REGS:[0,+0]
;;    |   3   64 |    1    8    2   11 | r216=[r215]                     
GR_REGS:[26,-1] FP_REGS:[0,+1]
;;    |   4   65 |    0    8    3    8 | r143=fix(r216)                  
GR_REGS:[25,+1] FP_REGS:[1,-1]
;;    |   5   67 |    0    8    4    4 | r146=r143<<0x3                  
GR_REGS:[26,+1] FP_REGS:[0,+0]
;;    |   6   68 |    0    8    5    3 | r217=r144+r146                  
GR_REGS:[27,+1] FP_REGS:[0,+0]
;;    |   7   69 |    4    7    4    5 | r218=flt(r143)                  
GR_REGS:[28,+0] FP_REGS:[0,+1]
;;    |   8   70 |    3    8    6    2 | [r217]=r218                     
GR_REGS:[28,-1] FP_REGS:[1,-1]
;;    |   9   58 |    6    4    0    5 | r211=r138+r183                  
GR_REGS:[27,+1] FP_REGS:[0,+0]
;;    |  10   60 |    5    5    2    4 | r213=[r211]                     
GR_REGS:[28,-1] FP_REGS:[0,+1]


And Main scheduler reorders them back in orig order 60 -> 70

;;      0--> b  0: i  56 r210=flt(r185#0)                        
:alu:GR_REGS+0(0)FP_REGS+1(1):model 0
;;      1--> b  0: i  63 r215=r141+r183                          
:alu:GR_REGS+1(1)FP_REGS+0(0)
;;      2--> b  0: i  61 r214=[r243+low(`o')]                    
:alu:@GR_REGS+0(0)@FP_REGS+1(1)
;;      3--> b  0: i  57 [r242+low(`e')]=r210                    
:alu:@GR_REGS+0(0)@FP_REGS+0(-1):model 1
;;      4--> b  0: i  64 r216=[r215]                             
:alu:GR_REGS+0(-1)FP_REGS+1(1):model 3
;;      5--> b  4: i 102 r179=sxn(r179#0+0x1)                    
:alu:GR_REGS+1(0)FP_REGS+0(0)
;;      6--> b  4: i 105 r185=sxn(r185#0+r205#0)                 
:alu:GR_REGS+1(0)FP_REGS+0(0)
;;      7--> b  0: i  65 r143=fix(r216)                          
:alu:@GR_REGS+1(1)@FP_REGS+0(-1):model 4
;;      8--> b  0: i  58 r211=r138+r183                          
:alu:@GR_REGS+1(1)@FP_REGS+0(0)
;;      9--> b  0: i  60 r213=[r211]                             
:alu:GR_REGS+0(-1)FP_REGS+1(1)
;;     10--> b  0: i  69 r218=flt(r143)                          
:alu:GR_REGS+0(0)FP_REGS+1(1)
;;     11--> b  0: i  67 r146=r143<<0x3                          
:alu:@GR_REGS+1(1)@FP_REGS+0(0):model 5
;;     12--> b  0: i  68 r217=r144+r146                          
:alu:GR_REGS+1(1)FP_REGS+0(0):model 6
;;     13--> b  0: i  70 [r217]=r218                             
:alu:GR_REGS+0(-1)FP_REGS+0(-1):model 8



> And if it doesn't strictly need to be a valid schedule are we giving an 
> overly-optimistic view of the best that can be done from a pressure 
> standpoint with this change?  And if so, is that wise?

As I mentioned above, the design goal of model schedule is to keep pressure to 
min.
So indeed we might be a bit more optimistic than reality here. But main list 
scheduler fixes
that if that leads to undesired outcomes. What we are trying to do here is not 
pessimize
 in certain cases, especially when that's not by design but just an outcome of 
the
 implementation subtlety.

As I look at it, for this example, the incoming insn stream (at the entry of 
sched1) doesn't
 lead to spilling. However the "initial" arrangement of insn by model schedule 
ends up
requiring a spill and the clearly is an algorithmic/implementation oversight if 
not an
outright bug.


> I really don't have answers to those questions.  But I think they're 
> pretty fundamental to get a handle on before we can discuss 
> implementation details.

Of course these questions and challenges need to be put up if we are trying to 
making
fundamental behavioral changes to things established over a decade ago.
Every thing must be questioned and reasoned about. My understanding of this is 
very
recent, it would also be good if Richard or other folks experienced in art 
chime in as well.

Thx,
-Vineet

Reply via email to