https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
--- Comment #10 from Vineet Gupta <vineetg at gcc dot gnu.org> --- Debug update -fsched-verbose=99 dumps (they are reaaaaalllly verbose) For the insn/regs under consideration, the canonical pre-scheduled sequence with ideal live-range (but non-ideal load-to-use delay) is following ;; ====================================================== ;; -- basic block 3 from 17 to 98 -- before reload ;; ====================================================== ;; | 35 | 10 | r170#0=[r242+low(`u')] alu ;; | 44 | 6 | [r230+low(`_Z1sv')]=r170#0 alu ;; | 46 | 7 | r180=zxt(r170,0x10,0x10) alu ;; | 54 | 6 | [r230+low(const(`_Z1sv'+0x2))]=r180#0 alu ;; | 55 | 7 | r188=r170 0>>0x20 alu ;; | 64 | 6 | [r230+low(const(`_Z1sv'+0x4))]=r188#0 alu ;; | 65 | 7 | r197=r170 0>>0x30 alu ;; | 73 | 6 | [r230+low(const(`_Z1sv'+0x6))]=r197#0 alu r170 (insn 35) is the central character whose live range has to be longest because of dependencies. - {46, 55, 65} USE r170, and sources which create new pseudos - {54, 64, 73} are where these new pseudos sink. How these 2 sets are interleaved defines the register pressure. - If above src1:sink1:src2:sink2:src3:sink3: 1 reg suffices - If src1:src2:src3: 3 reg needed Per sched1 dumps, the "source" set gets inducted into the ready queue together: ;; dependencies resolved: insn 65 ;; tick updated: insn 65 into ready ;; dependencies resolved: insn 55 ;; tick updated: insn 55 into ready ;; dependencies resolved: insn 46 ;; tick updated: insn 46 into ready ;; dependencies resolved: insn 44 ;; tick updated: insn 44 into ready ;; +------------------------------------------------------ ;; | Pressure costs for ready queue ;; | pressure points GR_REGS:[26->28 at 17:54] FP_REGS:[1->1 at 0:94] ;; +------------------------------------------------------ ;; | 15 44 | 6 +3 | GR_REGS:[0 base cost 0] FP_REGS:[0 base cost 0] ;; | 16 46 | 7 +3 | GR_REGS:[1 base cost 0] FP_REGS:[0 base cost 0] ^^^^ ;; | 18 55 | 7 +3 | GR_REGS:[1 base cost 1] FP_REGS:[0 base cost 0] ^^^^ ;; | 20 65 | 7 +3 | GR_REGS:[1 base cost 1] FP_REGS:[0 base cost 0] ^^^^ ;; | 11 76 | 10 +2 | GR_REGS:[1 base cost 0] FP_REGS:[-1 base cost 0] ;; | 0 94 | 2 +1 | GR_REGS:[0 base cost 0] FP_REGS:[0 base cost 0] ;; | 28 92 | 5 +1 | GR_REGS:[0 base cost 0] FP_REGS:[1 base cost 0] ;; | 26 88 | 5 +1 | GR_REGS:[0 base cost 0] FP_REGS:[1 base cost 0] ;; | 22 79 | 9 +1 | GR_REGS:[0 base cost 0] FP_REGS:[1 base cost 0] ;; +------------------------------------------------------ ;; RFS_PRESSURE_DELAY: 7: 44 46 76 94 ;; RFS_PRIORITY: 6: 92 88 79 ;; RFS_PRESSURE_INDEX: 2: 55 ;; Ready list (t = 10): 65:44(cost=1:prio=7:delay=3:idx=20) 55:42(cost=1:prio=7:delay=3:idx=18) 44:39(cost=0:prio=6:delay=3:idx=15) 46:40(cost=0:prio=7:delay=3:idx=16) 76:47(cost=0:prio=10:delay=2:idx=11) 94:58(cost=0:prio=2:delay=1:idx=0) 92:56(cost=0:prio=5:delay=1:idx=28) 88:54(cost=0:prio=5:delay=1:idx=26) 79:48(cost=0:prio=9:delay=1:idx=22) As the algorithm converges, they move around a bit, but rarely are the src/sink considered in same iteration and if at all only 1 ;; +------------------------------------------------------ ;; | Pressure costs for ready queue ;; | pressure points GR_REGS:[29->29 at 0:94] FP_REGS:[1->1 at 0:94] ;; +------------------------------------------------------ ... ;; | 19 64 | 6 +0 | GR_REGS:[-1 base cost -1] FP_REGS:[0 base cost 0] ;; | 17 54 | 6 +0 | GR_REGS:[-1 base cost -1] FP_REGS:[0 base cost 0] ;; | 20 65 | 7 +0 | GR_REGS:[0 base cost 0] FP_REGS:[0 base cos All of this leads to the pessimistic schedule emitted in the end. I'm still trying to wrap my head around the humungous dump info.