Jon Beniston wrote:
It looks like DFA pipeline hazard recognizer works well. Even the data ready for stores, there is 2 cycle delay between stores because memory unit is reserved by previous store insn.Hi Vlad,
There is not enough information to say what is wrong. It would be better if you send gcc output when -fsched-verbose=10 is used.
Cheers, Jon
;; Ready list (t = 10): 32 28 24 ;; 10--> 24 [`y']=r43 :x,m*2 ;; Ready list (t = 10): 32 28 ;; Ready list after queue_to_ready: 32 28 ;; Ready list after ready_sort: 32 28 ;; Ready list (t = 11): 32 28 ;; Ready-->Q: insn 28: queued for 1 cycles. ;; Ready list (t = 11): 32 ;; Ready-->Q: insn 32: queued for 1 cycles. ;; Ready list (t = 11): ;; Q-->Ready: insn 32: moving to ready without stalls ;; Q-->Ready: insn 28: moving to ready without stalls ;; Ready list after queue_to_ready: 28 32 ;; Ready list after ready_sort: 32 28 ;; Ready list (t = 12): 32 28 ;; 12--> 28 [`z']=r45 :x,m*2 ;; Ready list (t = 12): 32 ;; Ready list after queue_to_ready: 32 ;; Ready list after ready_sort: 32 ;; Ready list (t = 13): 32 ;; Ready-->Q: insn 32: queued for 1 cycles. ;; Ready list (t = 13): ;; Q-->Ready: insn 32: moving to ready without stalls ;; Ready list after queue_to_ready: 32 ;; Ready list after ready_sort: 32 ;; Ready list (t = 14): 32 ;; 14--> 32 [`w']=r47 :x,m*2 ;; Ready list (t = 14): ;; Ready list (final): ;; total time = 14 ;; new head = 18 ;; new tail = 32
The problem is in heuristics used to sort ready insns. The most high priority heuristic is critical path length. Loads have the biggest value 6, than additions have value 4, and finaly stores have value 3.
So the heuristic does not work well for you. But experience of many compiler developers shows that is the best heuristic for list insn scheduling. It could be fixed by using other more sophistciated algorithms or optimal algorithms which as a rule can not be used in an industrial compiler because they are too slow.
Vlad