http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53107
Bug #: 53107 Summary: scheduling fail Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: m...@gcc.gnu.org When generating code for testsuite/gcc.c-torture/execute/ieee/pr50310.c, I noticed that all the stores are pushed to the end where they can't execute simultaneously with other instructions. I have tons of free execution slots around the stores, as the stores have to contend with a relatively narrow off-chip data path to memory. The code looks something like: ;; --------------- forward dependences: ------------ ;; --- Region Dependences --- b 2 bb 0 ;; insn code bb dep prio cost reservation ;; ---- ---- -- --- ---- ---- ----------- ;; 18 392 2 2 7 1 cmpcc : 35 19 ;; 19 1633 2 2 6 1 movcc : 20 ;; 20 80 2 2 5 5 stm_4 : ;; 35 1633 2 2 6 1 movcc : 36 ;; 36 80 2 2 5 5 stm_4 : ;; 50 388 2 2 7 1 cmpcc : 67 51 ;; 51 1633 2 2 6 1 movcc : 52 ;; 52 80 2 2 5 5 stm_4 : ;; 67 1633 2 2 6 1 movcc : 68 ;; 68 80 2 2 5 5 stm_4 : ;; 82 389 2 2 7 1 cmpcc : 99 83 ;; 83 1633 2 2 6 1 movcc : 84 ;; 84 80 2 2 5 5 stm_4 : ;; 99 1633 2 2 6 1 movcc : 100 ;; 100 80 2 2 5 5 stm_4 : [ repeated 10 more times] with a sequence of 16 of the 3 instruction block as this is an -O3 compile. Most of the costs associated with cmpcc and movcc would be free, if they were moved near the stm instructions. The scheduling algorithm sorts and issues the insns based upon prio, so, all the 7s (cmpcc) go first, then all the 6s go next (movcc), and all the stores (stm_4) last. This hurts, and the original ordering would have produced faster code. :-( I don't know the best way to fix this, as this is just a machine independent part of the algorithm that dates back to the original, this is how you schedule paper. It is incomplete and is now overly simplistic for the types of cpus some people build. The best fix is one that refines the costs in some way. For example, cmpcc, movecc, stm_4 with the stm_4 staggered 1 group down, when run through the dfa, would come to the conclusion that the cmpcc and movecc instructions are free. Presently the priority field is a simple addition of the individual costs of the insns, not taking into consideration that the dfa knows that simple addition is a poor substitute.