On 8/5/24 5:35 PM, Vineet Gupta wrote:
Hi Richard,

I'm reaching out for some insight on PR/114729. Apologies in advance for
the long email.

On RISC-V we are hitting sched1 pathology on SPEC2017 Cactu where
codegen spills are overwhelming the execution: disabling sched1 shaves
off 1.3 trillion dynamic icounts which is about half of total execution
(in user mode qemu run).

I have a reduced test down to a single spill (w/ sched1 and non w/o).
The full sched1 verbose log for test is attached for reference (so is
the test case itself).

The gist of the issue (from schedule pov) is annotated insn below: 46,
54, 55

     ld    a5,%lo(u)(s0)
     fld    fa5,%lo(f)(t6)
     fcvt.l.d a4,fa4,rtz
     srli    a0,a5,16                  # insn 46  (ideal order 1)
     srli    a1,a5,32                  # insn 55  (ideal order 3)
     sh    a5,%lo(_Z1sv)(a3)    # insn 44 (not directly related in
ordering but plays a role during ready_sort)
     slli    a4,a4,3
     srli    a5,a5,48
     fsd    fa5,%lo(e)(t0)
     add    a4,s9,a4
     sh    a0,%lo(_Z1sv+2)(a3)        # insn 54  (ideal order 2)
     sh    a1,%lo(_Z1sv+4)(a3)        # insn 64
     sh    a5,%lo(_Z1sv+6)(a3)

If insn 54 were scheduled ahead of insn 55, the corresponding reg need
not be allocated (and consequently not spilled) in the outer loop.
There are no uses of a0 after insn 54.
So what does the ready queue look like at the start of whatever cycle insn 46 fires on? I would expect insn 46, 55, 44, the slli & fsd just based on data dependencies.




Looking into sched1 logs (#8916):  schedule_insn () is called for insn
46 and insn 54 added to ready q.
Presumably those happen on different cycles? I would not expect 45 to enter the ready queue on some cycle N. Then on N+M insn 54 should enter the ready queue, prsumably with M == 1.


Then we have ready_sort -> ready_sort_real () -> model_set_excess_costs
() called for insns in ready q.

For insn 54, model_excess_cost () there is a reg dead, hence the -1 in
print below. However its model_excess_group_cost () is still 0,
disregarding the delta -1.

;;        |  17   54 |    6  +1 | GR_REGS:[-1 base cost 0] FP_REGS:[0
base cost 0]

Per comments around model_set_excess_costs () this seems intentional /
as designed "... negative costs are converted to zero ones ...".

The subsequent qsort () w/ numerous gyrations of rank_for_schedule()
ends up moving 55 to top.
Presumably due to the number of uses and the types of uses.


;;    Ready list (t =  14):
81:50(cost=0:prio=7:delay=1:idx=12)
94:58(cost=0:prio=2:idx=0)
92:56(cost=0:prio=5:idx=28)
88:54(cost=0:prio=5:idx=26)
80:49(cost=0:prio=6:idx=23)
54:41(cost=0:prio=6:idx=17)
44:39(cost=0:prio=6:idx=15)
65:44(cost=0:prio=7:idx=20)
55:42(cost=0:prio=7:idx=18)
Hard to know what to make of this, most of the insns referenced in the queue don't refer back to anything you've shown above.




This does change scheduling to move insn 44 ahead of insn 54 and in the
next cycle, ideal insn 54 is picked (ahead of insn 55) avoiding the spill.

So it seems it all comes down to ready_sort () -> rank_for_schedule()
for deciding the winner (added prints in rfs_result)

...
;; --1-- RFS_PRIORITY winner 55 looser 54
;; --2-- RFS_PRIORITY winner 55 looser 54
Which is correct. From a critical path standpoint it's more important to get insn 55 issued than it is to get insn 54 issued. But......

...
;; --1-- RFS_PRIORITY winner 55 looser 44
;; --2-- RFS_PRIORITY winner 55 looser 44
;; --1-- RFS_PRESSURE_INDEX winner 55 looser 65
;; --1-- RFS_PRESSURE_INDEX winner 55 looser 65
;; --2-- RFS_PRESSURE_INDEX winner 55 looser 65



Note that priority of insn 55 (prio 7, alu insn) will always be higher
than insn 54 (prio 6, a store) since insn 55 SD_LIST_FWD has a
store/sink (insn 64, also with a prio 6).
And since ECC (model_set_excess_costs) calculations also include
priority, there's a bias towards priority in the ranking.
Meaning all else being constant, priority will invariably cause a win.
Right, that's right bias in general. But with the priorities only differing by 1 unit, I would have expected the increase in register pressure to be enough to overcome that bias or at least make them equal.


I did find a potential issue where it seems to be disregarding prio tmp
prio tmp2 (although this doesn't fix the issue)
Unsure ;-)

Note that in rank_for_schedule we check pressure state before we check priority. So it's a bit unclear why RFS_PRIORITY was selected when comparing insns 55 and 54. I guess that would tend to indicate that the ECC wasn't enough to make a difference:

 if (sched_pressure != SCHED_PRESSURE_NONE)
    {
      /* Prefer insn whose scheduling results in the smallest register
         pressure excess.  */
      if ((diff = (INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp)
                   + insn_delay (tmp)
                   - INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp2)
                   - insn_delay (tmp2))))
        return rfs_result (RFS_PRESSURE_DELAY, diff, tmp, tmp2);
    }

  if (sched_pressure != SCHED_PRESSURE_NONE
      && (INSN_TICK (tmp2) > clock_var || INSN_TICK (tmp) > clock_var)
      && INSN_TICK (tmp2) != INSN_TICK (tmp))
    {
      diff = INSN_TICK (tmp) - INSN_TICK (tmp2);
      return rfs_result (RFS_PRESSURE_TICK, diff, tmp, tmp2);
    }


jeff

Reply via email to