2011/8/8 Richard Sandiford <richard.sandif...@linaro.org>
>
> Ayal Zaks <ayal.z...@gmail.com> writes:
> >> OK.  For the follow-on iv patch, it seemed easier to keep both bounds
> >> inclusive for the loop, then make the "end" exclusive when setting the
> >> out parameters.  (Note that there shouldn't be any problem with overflow
> >> when making the bound exclusive, because the size of the range has been
> >> restricted to "ii" by that stage.  The follow-on patch does use
> >> saturating maths for all ops though.)
> >
> > Sure, no problem having "end" inclusive, as it simplifies things. We
> > can even keep it inclusive all the way through, iterating over: for (c
> > = start; c != end+step; c += step)...
>
> Yeah, I'd prefer that too.  I might do it once Revital's and
> Roman's changes are in.
>
> >> I don't have powerpc hardware that I can do meaningful performance
> >> testing on, but I did run it through a Popular* Embedded Benchmark
> >> on an ARM Cortex-A8 board with -O3 -fmodulo-sched
> >> -fmodulo-sched-allow-regmoves.  There were no changes.  (And this is
> >> a benchmark that does benefit from modulo scheduling, in some cases
> >> by a significant amount.)
> >>
> > Ok, the patch should be neutral performance-wise. One could check btw
> > if SMS produces exactly the same output in both cases, even w/o a
> > powerpc (or any other) HW.
>
> OK.  The patch does change the output a little because of the MEM_DEP
> range thing.


Ahh, right.

>
>  To compensate for that, I tried compiling libav on ARM with:
>
> Index: gcc/modulo-sched.c
> ===================================================================
> --- gcc/modulo-sched.c  2011-08-05 11:23:16.000000000 +0100
> +++ gcc/modulo-sched.c  2011-08-05 11:27:57.000000000 +0100
> @@ -1680,7 +1680,7 @@ get_sched_window (partial_schedule_ptr p
>                          p_st, early_start, e->latency);
>
>              if (e->data_type == MEM_DEP)
> -               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> +               end = MIN (end, SCHED_TIME (v_node) + ii);
>            }
>          else if (dump_file)
>             fprintf (dump_file, "the node is not scheduled\n");
> @@ -1729,7 +1729,7 @@ get_sched_window (partial_schedule_ptr p
>                          s_st, late_start, e->latency);
>
>              if (e->data_type == MEM_DEP)
> -               end = MAX (end, SCHED_TIME (v_node) - ii + 1);
> +               end = MAX (end, SCHED_TIME (v_node) - ii);
>              if (dump_file)
>                  fprintf (dump_file, "end = %d\n", end);
>
> @@ -1791,7 +1791,7 @@ get_sched_window (partial_schedule_ptr p
>                 count_preds++;
>
>              if (e->data_type == MEM_DEP)
> -               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> +               end = MIN (end, SCHED_TIME (v_node) + ii);
>            }
>           else if (dump_file)
>             fprintf (dump_file, "the node is not scheduled\n");
>
> applied, then tried the same thing with the revised patch below.
> The output was the same.


ok, that's a good sanity check.

>
> (FWIW, libav did show up extra differences when using the patch
> that I'd originally submitted.  They were due to the count_preds
> and count_succs thing that you picked up in your review.)


(These differences had no noticable consequences performance-wise, right?)

>
> >> Bootstrapped & regression-tested on powerpc-ibm-aix5.3 with the
> >> additional patch:
> >>
> >> Index: gcc/opts.c
> >> ===================================================================
> >> --- gcc/opts.c  2011-08-03 10:56:50.000000000 +0100
> >> +++ gcc/opts.c  2011-08-03 10:56:50.000000000 +0100
> >> @@ -449,6 +449,8 @@ static const struct default_options defa
> >>     { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
> >>     { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 },
> >>     { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 },
> >> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched, NULL, 1 },
> >> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched_allow_regmoves, NULL, 1 },
> >>
> >>     /* -O2 optimizations.  */
> >>     { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
> >>
> >> applied for both the "before" and "after" runs.  OK to install?
> >
> >
> > Yes, with just a couple of minor non-mandatory comments:
> > 1. Setting "count_preds = psp_not_empty;" and "count_succs =
> > pss_not_empty;" suggests that you want to decrement non-interesting
> > neighbors instead of incrementing interesting ones. Otherwise can
> > initialize to zero (doesn't really matter, just a bit confusing).
>
> Yeah, good point.  I initialised them to zero, like you said,
> and changed the reversal condition to:
>
>  if (pss_not_empty && count_succs >= count_preds)
>
> Which I have admit makes a lot more sense than what I submitted,
> and avoids some unwanted changes in output.
>
> > 2. I find it easier to read "late_start = MIN (late_start, early_start
> > + (ii - 1));" instead of "late_start = MIN (early_start + (ii - 1),
> > late_start);".
>
> Oops, that was accidental.  Fixed.
>
> > 3. Suggest to set step=1 at the beginning, explaining from the start
> > that we first compute a forward range (start<end), and may later
> > possibly revert it.
>
> Yeah, that's better.
>
> > 4. "late_start" would better be renamed "late_end".
>
> Since you said this was optional, I decided to leave it as it is.
> The name and value of "late start" come directly from the paper.


Ok. We can write another paper ;-)

>
> > 5. While we're at it, as you've unified the treatments, we can now
> > actually do with "start" and "end" only, and do away with early_start
> > and late_start altogether...
>
> The IV patch needs the same distinction as we have now, so I'd prefer
> to keep it.


Ok, looking forward to the IV patch then...

>
> Thanks for the review.  Here's what I applied after testing on
> powerpc-ibm-aix5.3.
>


Thanks for looking into this,
Ayal.


>
> Richard
>
>
> gcc/
>        * modulo-sched.c (get_sched_window): Use just one loop for predecessors
>        and one loop for successors.  Fix upper bound of memory range.
>
> Index: gcc/modulo-sched.c
> ===================================================================
> *** gcc/modulo-sched.c  2011-08-05 10:39:27.000000000 +0100
> --- gcc/modulo-sched.c  2011-08-05 10:40:49.000000000 +0100
> *************** #define MAX_SPLIT_NUM 10
> *** 1630,1638 ****
>
>  static int
>  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
> !                 sbitmap sched_nodes, int ii, int *start_p, int *step_p, int 
> *end_p)
>  {
>    int start, step, end;
>    ddg_edge_ptr e;
>    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
>    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
> --- 1630,1640 ----
>
>  static int
>  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
> !                 sbitmap sched_nodes, int ii, int *start_p, int *step_p,
> !                 int *end_p)
>  {
>    int start, step, end;
> +   int early_start, late_start;
>    ddg_edge_ptr e;
>    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
>    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
> *************** get_sched_window (partial_schedule_ptr p
> *** 1640,1645 ****
> --- 1642,1649 ----
>    sbitmap u_node_succs = NODE_SUCCESSORS (u_node);
>    int psp_not_empty;
>    int pss_not_empty;
> +   int count_preds;
> +   int count_succs;
>
>    /* 1. compute sched window for u (start, end, step).  */
>    sbitmap_zero (psp);
> *************** get_sched_window (partial_schedule_ptr p
> *** 1647,1861 ****
>    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
>    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
>
> !   if (psp_not_empty && !pss_not_empty)
> !     {
> !       int early_start = INT_MIN;
> !
> !       end = INT_MAX;
> !       for (e = u_node->in; e != 0; e = e->next_in)
> !       {
> !         ddg_node_ptr v_node = e->src;
> !
> !           if (dump_file)
> !             {
> !             fprintf (dump_file, "\nProcessing edge: ");
> !               print_ddg_edge (dump_file, e);
> !             fprintf (dump_file,
> !                      "\nScheduling %d (%d) in psp_not_empty,"
> !                      " checking p %d (%d): ", u_node->cuid,
> !                      INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                      (v_node->insn));
> !             }
> !
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int p_st = SCHED_TIME (v_node);
> !
> !               early_start =
> !                 MAX (early_start, p_st + e->latency - (e->distance * ii));
> !
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "pred st = %d; early_start = %d; latency: %d",
> !                          p_st, early_start, e->latency);
> !
> !             if (e->data_type == MEM_DEP)
> !               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> !           }
> !          else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
> !       }
> !       start = early_start;
> !       end = MIN (end, early_start + ii);
> !       /* Schedule the node close to it's predecessors.  */
> !       step = 1;
> !
> !       if (dump_file)
> !         fprintf (dump_file,
> !                "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
> !                u_node->cuid, INSN_UID (u_node->insn), start, end, step);
> !     }
> !
> !   else if (!psp_not_empty && pss_not_empty)
> !     {
> !       int late_start = INT_MAX;
> !
> !       end = INT_MIN;
> !       for (e = u_node->out; e != 0; e = e->next_out)
> !       {
> !         ddg_node_ptr v_node = e->dest;
> !
> !           if (dump_file)
> !             {
> !               fprintf (dump_file, "\nProcessing edge:");
> !               print_ddg_edge (dump_file, e);
> !               fprintf (dump_file,
> !                        "\nScheduling %d (%d) in pss_not_empty,"
> !                        " checking s %d (%d): ", u_node->cuid,
> !                        INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                        (v_node->insn));
> !             }
> !
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int s_st = SCHED_TIME (v_node);
> !
> !               late_start = MIN (late_start,
> !                                 s_st - e->latency + (e->distance * ii));
> !
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "succ st = %d; late_start = %d; latency = %d",
> !                          s_st, late_start, e->latency);
> !
> !             if (e->data_type == MEM_DEP)
> !               end = MAX (end, SCHED_TIME (v_node) - ii + 1);
> !              if (dump_file)
> !                  fprintf (dump_file, "end = %d\n", end);
> !
> !           }
> !           else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
>
> !       }
> !       start = late_start;
> !       end = MAX (end, late_start - ii);
> !       /* Schedule the node close to it's successors.  */
> !       step = -1;
>
> !       if (dump_file)
> !         fprintf (dump_file,
> !                  "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
> !                  u_node->cuid, INSN_UID (u_node->insn), start, end, step);
>
> !     }
>
> !   else if (psp_not_empty && pss_not_empty)
> !     {
> !       int early_start = INT_MIN;
> !       int late_start = INT_MAX;
> !       int count_preds = 0;
> !       int count_succs = 0;
>
> !       start = INT_MIN;
> !       end = INT_MAX;
> !       for (e = u_node->in; e != 0; e = e->next_in)
> !       {
> !         ddg_node_ptr v_node = e->src;
>
> !         if (dump_file)
> !           {
> !               fprintf (dump_file, "\nProcessing edge:");
> !               print_ddg_edge (dump_file, e);
>              fprintf (dump_file,
> !                      "\nScheduling %d (%d) in psp_pss_not_empty,"
> !                      " checking p %d (%d): ", u_node->cuid, INSN_UID
> !                      (u_node->insn), v_node->cuid, INSN_UID
> !                      (v_node->insn));
> !           }
> !
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int p_st = SCHED_TIME (v_node);
> !
> !             early_start = MAX (early_start,
> !                                p_st + e->latency
> !                                - (e->distance * ii));
> !
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "pred st = %d; early_start = %d; latency = %d",
> !                          p_st, early_start, e->latency);
> !
> !               if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !                 count_preds++;
>
> !             if (e->data_type == MEM_DEP)
> !               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> !           }
> !           else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
>
> !       }
> !       for (e = u_node->out; e != 0; e = e->next_out)
> !       {
> !         ddg_node_ptr v_node = e->dest;
>
> !         if (dump_file)
> !           {
> !               fprintf (dump_file, "\nProcessing edge:");
> !               print_ddg_edge (dump_file, e);
> !             fprintf (dump_file,
> !                      "\nScheduling %d (%d) in psp_pss_not_empty,"
> !                      " checking s %d (%d): ", u_node->cuid, INSN_UID
> !                      (u_node->insn), v_node->cuid, INSN_UID
> !                      (v_node->insn));
> !           }
>
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int s_st = SCHED_TIME (v_node);
>
> !             late_start = MIN (late_start,
> !                               s_st - e->latency
> !                               + (e->distance * ii));
>
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "succ st = %d; late_start = %d; latency = %d",
> !                          s_st, late_start, e->latency);
>
> !                if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !                  count_succs++;
>
> !             if (e->data_type == MEM_DEP)
> !               start = MAX (start, SCHED_TIME (v_node) - ii + 1);
> !           }
> !           else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
>
> !       }
> !       start = MAX (start, early_start);
> !       end = MIN (end, MIN (early_start + ii, late_start + 1));
> !       step = 1;
> !       /* If there are more successors than predecessors schedule the
> !          node close to it's successors.  */
> !       if (count_succs >= count_preds)
> !         {
> !           int old_start = start;
>
> !           start = end - 1;
> !           end = old_start - 1;
> !           step = -1;
> !         }
> !     }
> !   else /* psp is empty && pss is empty.  */
> !     {
> !       start = SCHED_ASAP (u_node);
> !       end = start + ii;
> !       step = 1;
>      }
>
>    *start_p = start;
>    *step_p = step;
>    *end_p = end;
> --- 1651,1772 ----
>    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
>    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
>
> !   /* We first compute a forward range (start <= end), then decide whether
> !      to reverse it.  */
> !   early_start = INT_MIN;
> !   late_start = INT_MAX;
> !   start = INT_MIN;
> !   end = INT_MAX;
> !   step = 1;
> !
> !   count_preds = 0;
> !   count_succs = 0;
> !
> !   /* Calculate early_start and limit end.  Both bounds are inclusive.  */
> !   if (psp_not_empty)
> !     for (e = u_node->in; e != 0; e = e->next_in)
> !       {
> !       ddg_node_ptr v_node = e->src;
>
> !       if (dump_file)
> !         {
> !           fprintf (dump_file, "\nProcessing edge: ");
> !           print_ddg_edge (dump_file, e);
> !           fprintf (dump_file,
> !                    "\nScheduling %d (%d) in psp_not_empty,"
> !                    " checking p %d (%d): ", u_node->cuid,
> !                    INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                    (v_node->insn));
> !         }
>
> !       if (TEST_BIT (sched_nodes, v_node->cuid))
> !         {
> !           int p_st = SCHED_TIME (v_node);
>
> !           early_start = MAX (early_start,
> !                              p_st + e->latency - (e->distance * ii));
>
> !           if (e->data_type == MEM_DEP)
> !             end = MIN (end, p_st + ii - 1);
>
> !           if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !             count_preds++;
>
> !           if (dump_file)
>              fprintf (dump_file,
> !                      "pred st = %d; early_start = %d; latency: %d;"
> !                      " end: %d\n", p_st, early_start, e->latency, end);
>
> !         }
> !       else if (dump_file)
> !         fprintf (dump_file, "the node is not scheduled\n");
> !       }
>
> !   /* Calculate late_start and limit start.  Both bounds are inclusive.  */
> !   if (pss_not_empty)
> !     for (e = u_node->out; e != 0; e = e->next_out)
> !       {
> !       ddg_node_ptr v_node = e->dest;
>
> !       if (dump_file)
> !         {
> !           fprintf (dump_file, "\nProcessing edge:");
> !           print_ddg_edge (dump_file, e);
> !           fprintf (dump_file,
> !                    "\nScheduling %d (%d) in pss_not_empty,"
> !                    " checking s %d (%d): ", u_node->cuid,
> !                    INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                    (v_node->insn));
> !         }
>
> !       if (TEST_BIT (sched_nodes, v_node->cuid))
> !         {
> !           int s_st = SCHED_TIME (v_node);
>
> !           late_start = MIN (late_start,
> !                             s_st - e->latency + (e->distance * ii));
>
> !           if (e->data_type == MEM_DEP)
> !             start = MAX (start, s_st - ii + 1);
>
> !           if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !             count_succs++;
>
> !           if (dump_file)
> !             fprintf (dump_file,
> !                      "succ st = %d; late_start = %d; latency = %d;"
> !                      " start=%d", s_st, late_start, e->latency, start);
>
> !         }
> !       else if (dump_file)
> !         fprintf (dump_file, "the node is not scheduled\n");
> !       }
>
> !   /* Get a target scheduling window no bigger than ii.  */
> !   if (early_start == INT_MIN && late_start == INT_MAX)
> !     early_start = SCHED_ASAP (u_node);
> !   else if (early_start == INT_MIN)
> !     early_start = late_start - (ii - 1);
> !   late_start = MIN (late_start, early_start + (ii - 1));
> !
> !   /* Apply memory dependence limits.  */
> !   start = MAX (start, early_start);
> !   end = MIN (end, late_start);
> !
> !   /* If there are at least as many successors as predecessors, schedule the
> !      node close to its successors.  */
> !   if (pss_not_empty && count_succs >= count_preds)
> !     {
> !       int tmp = end;
> !       end = start;
> !       start = tmp;
> !       step = -1;
>      }
>
> +   /* Now that we've finalized the window, make END an exclusive rather
> +      than an inclusive bound.  */
> +   end += step;
> +
>    *start_p = start;
>    *step_p = step;
>    *end_p = end;
> *************** get_sched_window (partial_schedule_ptr p
> *** 1867,1876 ****
>        if (dump_file)
>        fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
>                 start, end, step);
> !     return -1;
>      }
>
> !     return 0;
>  }
>
>  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the
> --- 1778,1787 ----
>        if (dump_file)
>        fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
>                 start, end, step);
> !       return -1;
>      }
>
> !   return 0;
>  }
>
>  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the

Reply via email to