On Mon, Dec 19, 2011 at 4:28 PM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > Hi Revital, > > Revital Eres <revital.e...@linaro.org> writes: >> The attached patch is a resubmission following comments made by Ayal >> and Richard. >> >> Tested and bootstrap with the other patches in the series on >> ppc64-redhat-linux, enabling SMS on loops with SC 1. >
The modulo-sched parts are approved, with some comments provided below. Moving get_regno_pressure_class() from loop-invariant.c to ira.c and other ira.h/ira-costs.c changes are not for me to approve, but they make sense to me. > Looks really good to me. I'll leave any approval to Ayal though. > Just one suggestion: > >> + /* Handle register moves. */ while we're at it, the comment above should be assigned to the "then" part below: >> + if (ps_i->id >= ps->g->num_nodes) >> + { >> + int old_regno = REGNO (ps_reg_move (ps, ps_i->id)->old_reg); >> + int new_regno = REGNO (ps_reg_move (ps, ps_i->id)->new_reg); >> + >> + change_pressure (old_regno, true); >> + change_pressure (new_regno, false); >> + change_pressure (old_regno, true); >> + } >> + else >> + { >> + for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++) >> + change_pressure (DF_REF_REGNO (*use_rec), true); >> + >> + for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++) >> + change_pressure (DF_REF_REGNO (*def_rec), false); >> + >> + for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++) >> + change_pressure (DF_REF_REGNO (*use_rec), true); >> + } > > It might be worth adding a comment to say why the code is doing it this way. Indeed, this does raise an eyebrow, and your explanation below is great. Suggest to start by saying that "The last two steps" are the natural way one would go about updating live registers in a bottom-up scan, except that in some cases (iiuc) the same physical register cannot be assigned to both use and def on same insn, so the first step is added conservatively. > E.g.: > > /* Process all uses, all defs, and then all uses again. The first > two steps give us an estimate of the pressure when dying inputs > cannot be tied to outputs (which is the worst case). The second > two steps update the set of live registers ready for the next > instruction. */ > > Also, as a general "future direction" comment, I think we should seriously > consider turning both this and -fmodulo-sched-allow-regmoves on by default, > so that -fmodulo-sched uses them unless explicitly told not to. The results > for ARM seemed to suggest that it is now the best SMS mode (although the > results can't be shared, unfortunately). It'd be unfortunate if users > had to write: > > -fmodulo-sched -fmodulo-sched-allow-regmoves -fmodulo-sched-reg-pressure > > instead of plain: > > -fmodulo-sched > > It might make sense as a follow-on patch rather than part of the main commit. > I'd be in favor, provided this works well for other platforms as well of-course. Ignoring the potential increase of register pressure inside a loop is being incautiously optimistic... It would be great to document testcases where spillage is detected; the original motivation of SMS is to reduce size of register live ranges so as to prevent spillage. There may be other means of preventing or mitigating spills other than bumping the II (e.g., "Minimum register requirements for a module schedule" Micro'94), which could be devised if concrete examples are analyzed. > There's also the separate debate about whether SMS should now be enabled > by default for ARM at -O3, but that's for another day... > Guess so... ;-) Thanks. > Richard Here are some more comments: +change_pressure (int regno, bool incr_p) ^^^^^^ I realize this was taken from elsewhere, but maybe "update_pressure" or "update_current_pressure" would better fit here. +{ + int nregs; + enum reg_class pressure_class; + + if (regno < FIRST_PSEUDO_REGISTER + && (TEST_HARD_REG_BIT (ira_no_alloc_regs, regno) + || TEST_HARD_REG_BIT (eliminable_regset, regno))) + return; + + /* Update the current set of live registers. Exit early if nothing + has changed. */ please clarify: we want to increase curr_reg_pressure as we scan upwards and encounter a last use; if regno is already live, this use is not last. likewise, we want to decrease curr_reg_pressure as we encounter a def; regno might not be live when !incr_p, if the def feeds only uses in next iteration. + if (incr_p + ? !bitmap_set_bit (&curr_regs_live, regno) + : !bitmap_clear_bit (&curr_regs_live, regno)) + return; + + pressure_class = get_regno_pressure_class (regno, &nregs); + + if (!incr_p) + curr_reg_pressure[pressure_class] -= nregs; + else + curr_reg_pressure[pressure_class] += nregs; + sounds a bit more natural to ask "if (incr_p)"...; and only if so, check also if the increased current pressure exceeds max: + if (max_reg_pressure[pressure_class] < curr_reg_pressure[pressure_class]) + max_reg_pressure[pressure_class] = curr_reg_pressure[pressure_class]; + that is: if (incr_p) { curr_reg_pressure[pressure_class] += nregs; if (curr_reg_pressure[pressure_class] > max_reg_pressure[pressure_class]) max_reg_pressure[pressure_class] = curr_reg_pressure[pressure_class]; } else curr_reg_pressure[pressure_class] -= nregs; + if (dump_file && flag_modulo_sched_verbose > 0) + { + if (incr_p) + fprintf (dump_file, "Increase pressure: regno %d %s(P%d,M%d)\n", + regno, reg_class_names[pressure_class], + curr_reg_pressure[pressure_class], + max_reg_pressure[pressure_class]); + else + fprintf (dump_file, "Decrease pressure: regno %d %s(P%d,M%d)\n", + regno, reg_class_names[pressure_class], + curr_reg_pressure[pressure_class], + max_reg_pressure[pressure_class]); + } wanna also print "nregs"? +} + +/* Update the register class information for the register moves in PS. */ anything interesting to explain about how reg_moves are handled? Otherwise comment is redundant. +static void +update_reg_moves_pressure_info (partial_schedule_ptr ps) ^^^^^^ initialize ... +/* Initialize the data-structures needed for the register pressure + calculation. */ +static void +initiate_reg_pressure_info (partial_schedule_ptr ps) ^^^^^^^^ initialize ... +/* Return TRUE if PS has register pressure. Otherwise return FALSE. ^^^^^^^^^^^^^^^^^^^^^ will spill a register + LOOP is the original loop. */ +bool +ps_will_spill_p (struct loop *loop, partial_schedule_ptr ps) +{ + bool pressure_p = false; + int i; + bitmap_iterator bi; + unsigned j; + + /* The register moves instructions do not appear in the basic-block yet + only in the PS however we need to substitute their uses in the basic + block in order to calculate the pressure correctly. We must undo + this at the end of the function because the loop should later on + be versioned without the substitutions. */ also, a predicate "ps_will_spill_p()" function should not transform its arguments. + apply_reg_moves (ps, false); + initiate_reg_pressure_info (ps); + + /* The register pressure is calculated by calling calc_liveness twice. */ complete the sentence ... after the first round, curr_reg_pressure includes all "upwards exposed" registers in the ps; feeding them into the second round will account for cross-iteration dependences. + if (dump_file && flag_modulo_sched_verbose > 0) + { + fprintf (dump_file, "\nStarting register pressure estimation\n"); + fprintf (dump_file, "Pass 1 of liveness calculation\n"); + fprintf (dump_file, "===============================\n"); + fprintf (dump_file, "\nlive regs:\n"); + EXECUTE_IF_SET_IN_BITMAP (&curr_regs_live, 0, j, bi) + fprintf (dump_file, "%d ", j); + fprintf (dump_file, "\n"); + } + why not fold most/all of the above dumps into calc_liveness()? + calc_liveness (ps); + + if (dump_file && flag_modulo_sched_verbose > 0) + { + fprintf (dump_file, "\nPass 2 of liveness calculation\n"); + fprintf (dump_file, "===============================\n"); + fprintf (dump_file, "\nlive regs:\n"); + EXECUTE_IF_SET_IN_BITMAP (&curr_regs_live, 0, j, bi) + fprintf (dump_file, "%d ", j); + fprintf (dump_file, "\n"); + } + + calc_liveness (ps); + + if (dump_file) + { + struct loop *parent; + + parent = loop_outer (loop); + fprintf (dump_file, "\n Loop %d (parent %d, header bb%d, depth %d)\n", + loop->num, (parent == NULL ? -1 : parent->num), + loop->header->index, loop_depth (loop)); + } + + if (dump_file) + fprintf (dump_file, "Pressure:\n"); + fuse with the above "if (dump_file)". + for (i = 0; i < ira_pressure_classes_num; i++) + { + enum reg_class pressure_class; + + pressure_class = ira_pressure_classes[i]; + + if (max_reg_pressure[pressure_class] == 0) + continue; + + if (dump_file) + fprintf (dump_file, "%s=%d %d ", reg_class_names[pressure_class], + max_reg_pressure[pressure_class], + ira_available_class_regs[pressure_class]); + + if (max_reg_pressure[pressure_class] + > ira_class_hard_regs_num[pressure_class]) + { + if (dump_file) + fprintf (dump_file, "(pressure)\n"); + + pressure_p = true; you can "break;" now. however, you have everything setup to compute the amount of spill, so suggest to do the following instead: pressure += max_reg_pressure[pressure_class] - ira_class_hard_regs_num[pressure_class]); or better call the variable "spillage" instead of "pressure", and return it. Current use will only check if it's zero or positive. Future use, as discussed offline, should compare this to the amount of spillage the original loop (w/o modulo-scheduling) is expected to have. + } + } + + bitmap_clear (&curr_regs_live); + undo_reg_moves (ps); + return pressure_p; +} Thanks, Ayal.