The AArch64 port uses SCHED_GROUP to mark instructions that get fused at issue time, to ensure that they will be issued together. However, in the scheduler, use of a SCHED_GROUP forces all other instructions to issue in the next cycle. This is wrong for AArch64 ports using insn fusing which can issue multiple insns per cycle, as aarch64 SCHED_GROUP insns can all issue in the same cycle, and other insns can issue in the same cycle also.
I put a testcase and some info in bug 81434. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81434 The attached patch fixes the problem. The behavior in pass == 0 is same as now. All non sched group insns are ignored, and all sched group insns are checked to see if they need to be queued for a latter cycle. The difference is in the second pass where non sched group insns are queued for a latter cycle only if there is a sched group insn that got queued. Since sched group insns always sort to the top of the list of insns to schedule, all sched group insns still get scheduled together as before. This has been tested with an Aarch64 bootstrap and make check. OK? Jim
2017-07-13 Jim Wilson <jim.wil...@linaro.org> PR rtl-optimization/81434 * haifa-sched.c (prune_ready_list): Init min_cost_group to 0. Update comment for main loop. In sched_group_found if, also add checks for pass and min_cost_group. diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c index 1b13e32..f6369d9 100644 --- a/gcc/haifa-sched.c +++ b/gcc/haifa-sched.c @@ -6300,7 +6300,7 @@ prune_ready_list (state_t temp_state, bool first_cycle_insn_p, { int i, pass; bool sched_group_found = false; - int min_cost_group = 1; + int min_cost_group = 0; if (sched_fusion) return; @@ -6316,8 +6316,8 @@ prune_ready_list (state_t temp_state, bool first_cycle_insn_p, } /* Make two passes if there's a SCHED_GROUP_P insn; make sure to handle - such an insn first and note its cost, then schedule all other insns - for one cycle later. */ + such an insn first and note its cost. If at least one SCHED_GROUP_P insn + gets queued, then all other insns get queued for one cycle later. */ for (pass = sched_group_found ? 0 : 1; pass < 2; ) { int n = ready.n_ready; @@ -6330,7 +6330,8 @@ prune_ready_list (state_t temp_state, bool first_cycle_insn_p, if (DEBUG_INSN_P (insn)) continue; - if (sched_group_found && !SCHED_GROUP_P (insn)) + if (sched_group_found && !SCHED_GROUP_P (insn) + && ((pass == 0) || (min_cost_group >= 1))) { if (pass == 0) continue;