Hi Richard,

I'm reaching out for some insight on PR/114729. Apologies in advance for
the long email.

On RISC-V we are hitting sched1 pathology on SPEC2017 Cactu where
codegen spills are overwhelming the execution: disabling sched1 shaves
off 1.3 trillion dynamic icounts which is about half of total execution
(in user mode qemu run).

I have a reduced test down to a single spill (w/ sched1 and non w/o).
The full sched1 verbose log for test is attached for reference (so is
the test case itself).

The gist of the issue (from schedule pov) is annotated insn below: 46,
54, 55

    ld    a5,%lo(u)(s0)
    fld    fa5,%lo(f)(t6)
    fcvt.l.d a4,fa4,rtz
    srli    a0,a5,16                  # insn 46  (ideal order 1)
    srli    a1,a5,32                  # insn 55  (ideal order 3)
    sh    a5,%lo(_Z1sv)(a3)    # insn 44 (not directly related in
ordering but plays a role during ready_sort)
    slli    a4,a4,3
    srli    a5,a5,48
    fsd    fa5,%lo(e)(t0)
    add    a4,s9,a4
    sh    a0,%lo(_Z1sv+2)(a3)        # insn 54  (ideal order 2)
    sh    a1,%lo(_Z1sv+4)(a3)        # insn 64
    sh    a5,%lo(_Z1sv+6)(a3)

If insn 54 were scheduled ahead of insn 55, the corresponding reg need
not be allocated (and consequently not spilled) in the outer loop.
There are no uses of a0 after insn 54.

Looking into sched1 logs (#8916):  schedule_insn () is called for insn
46 and insn 54 added to ready q.
Then we have ready_sort -> ready_sort_real () -> model_set_excess_costs
() called for insns in ready q.

For insn 54, model_excess_cost () there is a reg dead, hence the -1 in
print below. However its model_excess_group_cost () is still 0,
disregarding the delta -1.

;;        |  17   54 |    6  +1 | GR_REGS:[-1 base cost 0] FP_REGS:[0
base cost 0]

Per comments around model_set_excess_costs () this seems intentional /
as designed "... negative costs are converted to zero ones ...".

The subsequent qsort () w/ numerous gyrations of rank_for_schedule()
ends up moving 55 to top.

;;    Ready list (t =  14):   
81:50(cost=0:prio=7:delay=1:idx=12) 
94:58(cost=0:prio=2:idx=0) 
92:56(cost=0:prio=5:idx=28)
88:54(cost=0:prio=5:idx=26) 
80:49(cost=0:prio=6:idx=23) 
54:41(cost=0:prio=6:idx=17)
44:39(cost=0:prio=6:idx=15) 
65:44(cost=0:prio=7:idx=20) 
55:42(cost=0:prio=7:idx=18)

As a hack (and to see algorithm behavior) I tried to account for
negative delta

@@ -2433,7 +2433,7 @@ model_excess_cost (rtx_insn *insn, bool print_p)

-  cost = 0;
+  cost = -999;

       delta = insn_reg_pressure[pci].set_increase - insn_death[cl];
+      if (cost == -999)
+        cost = delta;

This does change scheduling to move insn 44 ahead of insn 54 and in the
next cycle, ideal insn 54 is picked (ahead of insn 55) avoiding the spill.

So it seems it all comes down to ready_sort () -> rank_for_schedule()
for deciding the winner (added prints in rfs_result)

...
;; --1-- RFS_PRIORITY winner 55 looser 54
;; --2-- RFS_PRIORITY winner 55 looser 54
...
;; --1-- RFS_PRIORITY winner 55 looser 44
;; --2-- RFS_PRIORITY winner 55 looser 44
;; --1-- RFS_PRESSURE_INDEX winner 55 looser 65
;; --1-- RFS_PRESSURE_INDEX winner 55 looser 65
;; --2-- RFS_PRESSURE_INDEX winner 55 looser 65


Note that priority of insn 55 (prio 7, alu insn) will always be higher
than insn 54 (prio 6, a store) since insn 55 SD_LIST_FWD has a
store/sink (insn 64, also with a prio 6).
And since ECC (model_set_excess_costs) calculations also include
priority, there's a bias towards priority in the ranking.
Meaning all else being constant, priority will invariably cause a win.

I did find a potential issue where it seems to be disregarding prio tmp
> prio tmp2 (although this doesn't fix the issue)

@@ -2723,7 +2744,7 @@ rank_for_schedule (const void *x, const void *y)

-  if (flag_sched_critical_path_heuristic && priority_val)
+  if (flag_sched_critical_path_heuristic && priority_val != 0)
     return rfs_result (RFS_PRIORITY, priority_val, tmp, tmp2);

Any ideas on how to tackle this would be very much appreciated.

Thx,
-Vineet

Attachment: spill.tar.gz
Description: application/gzip

Reply via email to