Let me add to my own post saying that it seems that the problem is that the list scheduler is greedy in the sense that it will take an instruction from the ready list no matter what when waiting and trying to pair it with later on with another instruction might be more beneficial. In a sense it seems that the idea is that 'issuing instructions as soon as possible is better' which might be true for a single issue chip but a VLIW with multiple issue has to contend with other problems.
Any thoughts on this? Paulo Matos > -----Original Message----- > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Paulo > Matos > Sent: 26 June 2013 15:08 > To: gcc@gcc.gnu.org > Subject: Delay scheduling due to possible future multiple issue in VLIW > > Hello, > > We have a port for a VLIW machine using gcc head 4.8 with an maximum issue of > 2 per clock cycle (sometimes only 1 due to machine constraints). > We are seeing the following situation in sched2: > > ;; --------------- forward dependences: ------------ > > ;; --- Region Dependences --- b 3 bb 0 > ;; insn code bb dep prio cost reservation > ;; ---- ---- -- --- ---- ---- ----------- > ;; 38 1395 3 0 6 4 > (p0+long_imm+ldst0+lock0),nothing*3 : 44m 43 41 40 > ;; 40 491 3 1 2 2 (p0+long_imm+ldst0+lock0),nothing > : 44m 41 > ;; 41 536 3 2 1 1 (p0+no_stl2)|(p1+no_dual) : 44 > ;; 43 1340 3 1 2 1 (p0+no_stl2)|(p1+no_dual) : 44m > ;; 44 1440 3 4 1 1 (p0+long_imm) : > > ;; dependencies resolved: insn 38 > ;; tick updated: insn 38 into ready > ;; dependencies resolved: insn 41 > ;; tick updated: insn 41 into ready > ;; Advanced a state. > ;; Ready list after queue_to_ready: 41:4 38:2 > ;; Ready list after ready_sort: 41:4 38:2 > ;; Ready list (t = 0): 41:4 38:2 > ;; Chosen insn : 38 > ;; 0--> b 0: i 38r1=zxn([r0+`b']) > :(p0+long_imm+ldst0+lock0),nothing*3 > ;; dependencies resolved: insn 43 > ;; Ready-->Q: insn 43: queued for 4 cycles (change queue index). > ;; tick updated: insn 43 into queue with cost=4 > ;; dependencies resolved: insn 40 > ;; Ready-->Q: insn 40: queued for 4 cycles (change queue index). > ;; tick updated: insn 40 into queue with cost=4 > ;; Ready-->Q: insn 41: queued for 1 cycles (resource conflict). > ;; Ready list (t = 0): > ;; Advanced a state. > ;; Q-->Ready: insn 41: moving to ready without stalls > ;; Ready list after queue_to_ready: 41:4 > ;; Ready list after ready_sort: 41:4 > ;; Ready list (t = 1): 41:4 > ;; Chosen insn : 41 > ;; 1--> b 0: i 41r0=r0+0x4 > :(p0+no_stl2)|(p1+no_dual) > > So, it is scheduling first insn 38 followed by 41. > The insn chain for bb3 before sched2 looks like: > (insn 38 36 40 3 (set (reg:DI 1 r1) > (zero_extend:DI (mem:SI (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ] > [119]) > (symbol_ref:SI ("b") [flags 0x80] <var_decl > 0x2b9c011f75a0 b>)) [2 MEM[symbol: b, index: ivtmp.13_7, offset: 0B]+0 S4 > A32]))) pr3115b.c:13 1395 {zero_extendsidi2} > (nil)) > (insn 40 38 41 3 (set (mem:SI (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ] > [119]) > (symbol_ref:SI ("a") [flags 0x80] <var_decl 0x2b9c011f7500 > a>)) [2 MEM[symbol: a, index: ivtmp.13_7, offset: 0B]+0 S4 A32]) > (reg:SI 1 r1 [orig:118 D.3048 ] [118])) pr3115b.c:13 491 {fp_movsi} > (nil)) > (insn 41 40 43 3 (set (reg:SI 0 r0 [orig:119 ivtmp.13 ] [119]) > (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ] [119]) > (const_int 4 [0x4]))) 536 {addsi3} > (nil)) > (insn 43 41 44 3 (set (reg:BI 64 p0 [122]) > (ne:BI (reg:SI 1 r1 [orig:118 D.3048 ] [118]) > (const_int 0 [0]))) pr3115b.c:13 1340 {cmp_simode} > (expr_list:REG_DEAD (reg:SI 1 r1 [orig:118 D.3048 ] [118]) > (nil))) > (jump_insn 44 43 55 3 (set (pc) > (if_then_else (ne (reg:BI 64 p0 [122]) > (const_int 0 [0])) > (label_ref:SI 35) > (pc))) pr3115b.c:13 1440 {cbranchbi4} > (expr_list:REG_DEAD (reg:BI 64 p0 [122]) > (expr_list:REG_BR_PROB (const_int 9844 [0x2674]) > (expr_list:REG_PRED_WIDTH (const_int 4 [0x4]) > (nil)))) > > > The problem with this is that GCC is scheduling insn 38, followed by 41, (a > patched) 40, 43 and 44. > However, if it had delayed scheduling 41, waited a clock cycle, issued 40 > then it would be able to issue 38 paired with 43 in the same clock cycle and > then 44. > So, instead of generating the following insn chain: > 38 > 41 > patched 40 > 43 > 44 > > it would generate > 38 > 40 > 41 : 43 > 44 > > Is there a way to instruct the scheduler to wait on an instruction on a given > clock cycle (even if that instruction is the only one on the ready list) > because it's possible that it can be paired with a later instruction in the > chain if issued simultaneously? > > Cheers, > > Paulo Matos >