Hi Jeff

On 8/5/24 21:31, Jeff Law wrote:
> On 8/5/24 5:35 PM, Vineet Gupta wrote:
>> Hi Richard,
>>
>> I'm reaching out for some insight on PR/114729. Apologies in advance for
>> the long email.
>>
>> On RISC-V we are hitting sched1 pathology on SPEC2017 Cactu where
>> codegen spills are overwhelming the execution: disabling sched1 shaves
>> off 1.3 trillion dynamic icounts which is about half of total execution
>> (in user mode qemu run).
>>
>> I have a reduced test down to a single spill (w/ sched1 and non w/o).
>> The full sched1 verbose log for test is attached for reference (so is
>> the test case itself).
>>
>> The gist of the issue (from schedule pov) is annotated insn below: 46,
>> 54, 55
>>
>>      ld    a5,%lo(u)(s0)
>>      fld    fa5,%lo(f)(t6)
>>      fcvt.l.d a4,fa4,rtz
>>      srli    a0,a5,16                  # insn 46  (ideal order 1)
>>      srli    a1,a5,32                  # insn 55  (ideal order 3)
>>      sh    a5,%lo(_Z1sv)(a3)           # insn 44
>>      slli    a4,a4,3
>>      srli    a5,a5,48
>>      fsd    fa5,%lo(e)(t0)
>>      add    a4,s9,a4
>>      sh    a0,%lo(_Z1sv+2)(a3)        # insn 54  (ideal order 2)
>>      sh    a1,%lo(_Z1sv+4)(a3)        # insn 64
>>      sh    a5,%lo(_Z1sv+6)(a3)
>>
>> If insn 54 were scheduled ahead of insn 55, the corresponding reg need
>> not be allocated (and consequently not spilled) in the outer loop.
>> There are no uses of a0 after insn 54.
> So what does the ready queue look like at the start of whatever cycle 
> insn 46 fires on?  I would expect insn 46, 55, 44, the slli & fsd just 
> based on data dependencies.

Dump just before insn 46 is scheduled (most of them are from tail end of
BB due to it processing insns from end)

;;        Ready list after ready_sort:
81:50(cost=0:prio=7:delay=2:idx=12) 
80:49(cost=0:prio=6:delay=1:idx=23)  65:44(cost=1:prio=7:idx=20) 
55:42(cost=1:prio=7:idx=18)  94:58(cost=0:prio=2:idx=0) 
92:56(cost=0:prio=5:idx=28)  88:54(cost=0:prio=5:idx=26) 
44:39(cost=0:prio=6:idx=15)  46:40(cost=0:prio=7:idx=16)
;;     13--> b  0: i  46 r180=zxt(r170,0x10,0x10)               
:alu:@GR_REGS+1(1)@FP_REGS+0(0)

46 is at the head hence it is scheduled.

>> Looking into sched1 logs (#8916):  schedule_insn () is called for insn
>> 46 and insn 54 added to ready q.
> Presumably those happen on different cycles?   I would not expect 45 to 
> enter the ready queue on some cycle N.  Then on N+M insn 54 should enter 
> the ready queue, prsumably with M == 1.

Nope they are in same cycle since 54 is in SD_LIST_FORW of insn 46.

  advance = schedule_insn(insn = 46)
       for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
            sd_iterator_cond (&sd_it, &dep);)
            try_ready
              fix_tick_ready
                 change_queue_index(insn = 54, delay=QUEUE_READY) 

>> Then we have ready_sort -> ready_sort_real () -> model_set_excess_costs
>> () called for insns in ready q.
>>
>> For insn 54, model_excess_cost () there is a reg dead, hence the -1 in
>> print below. However its model_excess_group_cost () is still 0,
>> disregarding the delta -1.
>>
>> ;;        |  17   54 |    6  +1 | GR_REGS:[-1 base cost 0] FP_REGS:[0
>> base cost 0]
>>
>> Per comments around model_set_excess_costs () this seems intentional /
>> as designed "... negative costs are converted to zero ones ...".
>>
>> The subsequent qsort () w/ numerous gyrations of rank_for_schedule()
>> ends up moving 55 to top.
> Presumably due to the number of uses and the types of uses.

Its the 3 things in following order:

RFS_PRESSURE_DELAY  (low wins): ECC(tmp) + insn_delay(tmp) - ECC (tmp2)
- insn_delay (tmp2)
RFS_PRIORITY (high wins): INSN_PRIORITY (tmp2) - INSN_PRIORITY (tmp)
RFS_PRESSURE_INDEX (low wins): model_index (tmp) - model_index (tmp2)

>> ;;    Ready list (t =  14):
>> 81:50(cost=0:prio=7:delay=1:idx=12)
>> 94:58(cost=0:prio=2:idx=0)
>> 92:56(cost=0:prio=5:idx=28)
>> 88:54(cost=0:prio=5:idx=26)
>> 80:49(cost=0:prio=6:idx=23)
>> 54:41(cost=0:prio=6:idx=17)
>> 44:39(cost=0:prio=6:idx=15)
>> 65:44(cost=0:prio=7:idx=20)
>> 55:42(cost=0:prio=7:idx=18)
> Hard to know what to make of this, most of the insns referenced in the 
> queue don't refer back to anything you've shown above.

Yes, as most of them are from towards end of BB. These are just ready,
but the one at the last is the sorted one which will be scheduled in the
insn stream.


>
>> This does change scheduling to move insn 44 ahead of insn 54 and in the
>> next cycle, ideal insn 54 is picked (ahead of insn 55) avoiding the spill.
>>
>> So it seems it all comes down to ready_sort () -> rank_for_schedule()
>> for deciding the winner (added prints in rfs_result)
>>
>> ...
>> ;; --1-- RFS_PRIORITY winner 55 looser 54
>> ;; --2-- RFS_PRIORITY winner 55 looser 54
> Which is correct.  From a critical path standpoint it's more important 
> to get insn 55 issued than it is to get insn 54 issued. But......
>
>> ...
>> ;; --1-- RFS_PRIORITY winner 55 looser 44
>> ;; --2-- RFS_PRIORITY winner 55 looser 44
>> ;; --1-- RFS_PRESSURE_INDEX winner 55 looser 65
>> ;; --1-- RFS_PRESSURE_INDEX winner 55 looser 65
>> ;; --2-- RFS_PRESSURE_INDEX winner 55 looser 65
>>
>> Note that priority of insn 55 (prio 7, alu insn) will always be higher
>> than insn 54 (prio 6, a store) since insn 55 SD_LIST_FWD has a
>> store/sink (insn 64, also with a prio 6).
>> And since ECC (model_set_excess_costs) calculations also include
>> priority, there's a bias towards priority in the ranking.
>> Meaning all else being constant, priority will invariably cause a win.
> Right, that's right bias in general.  But with the priorities only 
> differing by 1 unit, I would have expected the increase in register 
> pressure to be enough to overcome that bias or at least make them equal.

The actual reg pressure delta is subsumed in the ECC value which tends
to convert negative biases to zero.

>
>> I did find a potential issue where it seems to be disregarding prio tmp > 
>> prio tmp2 (although this doesn't fix the issue)
> Unsure ;-)

Yeah sorry that was stupid.


> Note that in rank_for_schedule we check pressure state before we check 
> priority.  So it's a bit unclear why RFS_PRIORITY was selected when 
> comparing insns 55 and 54.  I guess that would tend to indicate that the 
> ECC wasn't enough to make a difference:

Indeed. ECC delta would have to be neutral for it to be skipped. The
order of things checked is what I enumerated above.

I'm currently pursuing a different trail which comes form observation
that initial model setup concludes that pressure is 28 so with 27
allocable regs we are bound to spill one.
More on that after I find something concrete.

-Vineet

Reply via email to